Search | arXiv e-print repository

Locally Differentially Private In-Context Learning

Authors: Chunyan Zheng, Keke Sun, Wenhao Zhao, Haibo Zhou, Lixin Jiang, Shaoyang Song, Chunlai Zhou

Abstract: Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference at… ▽ More Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: This paper was published at LREC-Coling 2024

arXiv:2405.02863 [pdf, other]

doi 10.3847/1538-4365/ad4b17

Stellar X-ray activity and habitability revealed by ROSAT sky survey

Authors: Henggeng Han, Song Wang, Chuanjie Zheng, Xue Li, Kai Xiao, Jifeng Liu

Abstract: Using the homogeneous X-ray catalog from ROSAT observations, we conducted a comprehensive investigation into stellar X-ray activity-rotation relations for both single and binary stars. Generally, the relation for single stars consists of two distinct regions: a weak decay region, indicating a continued dependence of the magnetic dynamo on stellar rotation rather than a saturation regime with const… ▽ More Using the homogeneous X-ray catalog from ROSAT observations, we conducted a comprehensive investigation into stellar X-ray activity-rotation relations for both single and binary stars. Generally, the relation for single stars consists of two distinct regions: a weak decay region, indicating a continued dependence of the magnetic dynamo on stellar rotation rather than a saturation regime with constant activity, and a rapid decay region, where X-ray activity is strongly correlated with the Rossby number. Detailed analysis reveals more fine structures within the relation: in the extremely fast rotating regime, a decrease in X-ray activity was observed with increasing rotation rate, referred to as super-saturation, while in the extremely slow rotating region, the relation flattens, mainly due to the scattering of F stars. This scattering may result from intrinsic variability in stellar activities over one stellar cycle or the presence of different dynamo mechanisms. Binaries exhibit a similar relation to that of single stars while the limited sample size prevented the identification of fine structures in the relation for binaries. We calculated the mass loss rates of planetary atmosphere triggered by X-ray emissions from host stars. Our findings indicate that for an Earth-like planet within the stellar habitable zone, it would easily lose its entire primordial H/He envelope (equating to about 1% of the planetary mass). △ Less

Submitted 20 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 17 pages, 12 figures, ApJS accepted

arXiv:2405.02354 [pdf]

Heterogeneous network and graph attention auto-encoder for LncRNA-disease association prediction

Authors: Jin-Xing Liu, Wen-Yu Xi, Ling-Yun Dai, Chun-Hou Zheng, Ying-Lian Gao

Abstract: The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sou… ▽ More The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sources of biomedical data are fully utilized to construct characteristics of lncRNAs and diseases, and linear and nonlinear characteristics are effectively integrated. Furthermore, a novel deep learning model based on graph attention automatic encoder is proposed, called HGATELDA. To begin with, the linear characteristics of lncRNAs and diseases are created by the miRNA-lncRNA interaction matrix and miRNA-disease interaction matrix. Following this, the nonlinear features of diseases and lncRNAs are extracted using a graph attention auto-encoder, which largely retains the critical information and effectively aggregates the neighborhood information of nodes. In the end, LDAs can be predicted by fusing the linear and nonlinear characteristics of diseases and lncRNA. The HGATELDA model achieves an impressive AUC value of 0.9692 when evaluated using a 5-fold cross-validation indicating its superior performance in comparison to several recent prediction models. Meanwhile, the effectiveness of HGATELDA in identifying novel LDAs is further demonstrated by case studies. the HGATELDA model appears to be a viable computational model for predicting LDAs. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures

ACM Class: I.2.4; I.2.6; I.2.m

arXiv:2405.02287 [pdf, other]

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Authors: Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay

Abstract: We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a… ▽ More We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01053 [pdf, other]

Explicitly Modeling Universality into Self-Supervised Learning

Authors: Jingyao Wang, Wenwen Qiang, Zeen Song, Lingyu Si, Jiangmeng Li, Changwen Zheng, Bing Su

Abstract: The goal of universality in self-supervised learning (SSL) is to learn universal representations from unlabeled data and achieve excellent performance on all samples and tasks. However, these methods lack explicit modeling of the universality in the learning objective, and the related theoretical understanding remains limited. This may cause models to overfit in data-scarce situations and generali… ▽ More The goal of universality in self-supervised learning (SSL) is to learn universal representations from unlabeled data and achieve excellent performance on all samples and tasks. However, these methods lack explicit modeling of the universality in the learning objective, and the related theoretical understanding remains limited. This may cause models to overfit in data-scarce situations and generalize poorly in real life. To address these issues, we provide a theoretical definition of universality in SSL, which constrains both the learning and evaluation universality of the SSL models from the perspective of discriminability, transferability, and generalization. Then, we propose a $σ$-measurement to help quantify the score of one SSL model's universality. Based on the definition and measurement, we propose a general SSL framework, called GeSSL, to explicitly model universality into SSL. It introduces a self-motivated target based on $σ$-measurement, which enables the model to find the optimal update direction towards universality. Extensive theoretical and empirical evaluations demonstrate the superior performance of GeSSL. △ Less

Submitted 23 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 28 pages, submitted to ICML24 with 7766

arXiv:2404.19620 [pdf, other]

Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference

Authors: Haoxuan Li, Chunyuan Zheng, Sihao Ding, Peng Wu, Zhi Geng, Fuli Feng, Xiangnan He

Abstract: Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, name… ▽ More Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, named neighborhood effect. To fill the gap, this paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference and introduces a treatment representation to capture the neighborhood effect. On this basis, we propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect. We further develop two new estimators for estimating the proposed ideal loss. We theoretically establish the connection between the proposed and previous debiasing methods ignoring the neighborhood effect, showing that the proposed methods can achieve unbiased learning when both selection bias and neighborhood effect are present, while the existing methods are biased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed methods. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: ICLR 24

arXiv:2404.19596 [pdf, other]

Debiased Collaborative Filtering with Kernel-Based Causal Balancing

Authors: Haoxuan Li, Chunyuan Zheng, Yanghao Xiao, Peng Wu, Zhi Geng, Xu Chen, Peng Cui

Abstract: Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing… ▽ More Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing constraints. However, existing methods usually ignore such constraints or implement them with unreasonable approximations, which may affect the accuracy of the learned propensity scores. To bridge this gap, in this paper, we first analyze the gaps between the causal balancing requirements and existing methods such as learning the propensity with cross-entropy loss or manually selecting functions to balance. Inspired by these gaps, we propose to approximate the balancing functions in reproducing kernel Hilbert space and demonstrate that, based on the universal property and representer theorem of kernel functions, the causal balancing constraints can be better satisfied. Meanwhile, we propose an algorithm that adaptively balances the kernel function and theoretically analyze the generalization error bound of our methods. We conduct extensive experiments to demonstrate the effectiveness of our methods, and to promote this research direction, we have released our project at https://github.com/haoxuanli-pku/ICLR24-Kernel-Balancing. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: ICLR 24 Spotlight

arXiv:2404.16792 [pdf, other]

Weak-to-Strong Extrapolation Expedites Alignment

Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

Abstract: The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite… ▽ More The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the literature on model interpolation, we propose a simple method called ExPO to boost LLMs' alignment with human preference. Utilizing a model that has undergone alignment training (e.g., via DPO or RLHF) and its initial SFT checkpoint, ExPO directly obtains a better-aligned model by extrapolating from the weights of the initial and the aligned models, which implicitly optimizes the alignment objective via first-order approximation. Through experiments with twelve open-source LLMs on HuggingFace, we demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models, as evaluated on the mainstream LLM benchmarks AlpacaEval 2.0 and MT-Bench. Moreover, ExPO exhibits remarkable scalability across various model sizes (from 1.8B to 70B) and capabilities. Through controlled experiments and further empirical analyses, we shed light on the essence of ExPO amplifying the reward signal learned during alignment training. Our work demonstrates the efficacy of model extrapolation in expediting the alignment of LLMs with human preference, suggesting a promising direction for future research. △ Less

Submitted 22 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Add theoretical explanation and more evaluation results

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.13026 [pdf, other]

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Authors: Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman

Abstract: Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these… ▽ More Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Project website at: https://physdreamer.github.io/

arXiv:2404.12387 [pdf, other]

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Authors: Reka Team, Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu , et al. (1 additional authors not shown)

Abstract: We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but al… ▽ More We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai . △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12024 [pdf, other]

Meta-Auxiliary Learning for Micro-Expression Recognition

Authors: Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

Abstract: Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feat… ▽ More Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 10 pages, 7 figures, 3 tables

arXiv:2404.10337 [pdf, other]

Intriguing Properties of Positional Encoding in Time Series Forecasting

Authors: Jianqi Zhang, Jingyao Wang, Wenwen Qiang, Fanjiang Xu, Changwen Zheng, Fuchun Sun, Hui Xiong

Abstract: Transformer-based methods have made significant progress in time series forecasting (TSF). They primarily handle two types of tokens, i.e., temporal tokens that contain all variables of the same timestamp, and variable tokens that contain all input time points for a specific variable. Transformer-based methods rely on positional encoding (PE) to mark tokens' positions, facilitating the model to pe… ▽ More Transformer-based methods have made significant progress in time series forecasting (TSF). They primarily handle two types of tokens, i.e., temporal tokens that contain all variables of the same timestamp, and variable tokens that contain all input time points for a specific variable. Transformer-based methods rely on positional encoding (PE) to mark tokens' positions, facilitating the model to perceive the correlation between tokens. However, in TSF, research on PE remains insufficient. To address this gap, we conduct experiments and uncover intriguing properties of existing PEs in TSF: (i) The positional information injected by PEs diminishes as the network depth increases; (ii) Enhancing positional information in deep networks is advantageous for improving the model's performance; (iii) PE based on the similarity between tokens can improve the model's performance. Motivated by these findings, we introduce two new PEs: Temporal Position Encoding (T-PE) for temporal tokens and Variable Positional Encoding (V-PE) for variable tokens. Both T-PE and V-PE incorporate geometric PE based on tokens' positions and semantic PE based on the similarity between tokens but using different calculations. To leverage both the PEs, we design a Transformer-based dual-branch framework named T2B-PE. It first calculates temporal tokens' correlation and variable tokens' correlation respectively and then fuses the dual-branch features through the gated unit. Extensive experiments demonstrate the superior robustness and effectiveness of T2B-PE. The code is available at: \href{https://github.com/jlu-phyComputer/T2B-PE}{https://github.com/jlu-phyComputer/T2B-PE}. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.07127 [pdf, other]

Searching for short-period variables in M31: method and catalogs

Authors: Hongrui Gu, Haibo Yuan, Subo Dong, Chenfa Zheng, Shenzhe Cui, Yi Ren, Haozhu Fu, Yang Huang, Zhou Fan

Abstract: Utilizing high-cadence and continuous g- and r-band data over three nights acquired from the 3.6-meter Canada France Hawaii Telescope (CFHT) aimed to find short-duration microlensing events, we conduct a systematic search for variables, transients, and asteroids across a $\sim1^\circ$ field of view of the Andromeda Galaxy (M 31). We present a catalog of 5859 variable stars, yielding the most exten… ▽ More Utilizing high-cadence and continuous g- and r-band data over three nights acquired from the 3.6-meter Canada France Hawaii Telescope (CFHT) aimed to find short-duration microlensing events, we conduct a systematic search for variables, transients, and asteroids across a $\sim1^\circ$ field of view of the Andromeda Galaxy (M 31). We present a catalog of 5859 variable stars, yielding the most extensive compilation of short-period variable sources of M 31. We also detected 19 flares, predominantly associated with foreground M dwarfs in the Milky Way. In addition, we discovered 17 previously unknown asteroid candidates, and we subsequently reported them to the Minor Planet Center. Lastly, we report a microlensing event candidate C-ML-1 and present a preliminary analysis. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06988 [pdf, other]

Quantum Network Tomography via Learning Isometries on Stiefel Manifold

Authors: Ze-Tong Li, Xin-Lin He, Cong-Cong Zheng, Yu-Qian Dong, Tian Luan, Xu-Tao Yu, Zai-Chen Zhang

Abstract: Explicit mathematical reconstructions of quantum networks play a significant role in developing quantum information science. However, tremendous parameter requirements and physical constraint implementations have become computationally non-ignorable encumbrances. In this work, we propose an efficient method for quantum network tomography by learning isometries on the Stiefel manifold. Tasks of rec… ▽ More Explicit mathematical reconstructions of quantum networks play a significant role in developing quantum information science. However, tremendous parameter requirements and physical constraint implementations have become computationally non-ignorable encumbrances. In this work, we propose an efficient method for quantum network tomography by learning isometries on the Stiefel manifold. Tasks of reconstructing quantum networks are tackled by solving a series of unconstrained optimization problems with significantly less parameters. The stepwise isometry estimation shows the capability for providing information of the truncated quantum comb while processing the tomography. Remarkably, this method enables the compressive quantum comb tomography by specifying the dimensions of isometries. As a result, our proposed method exhibits high accuracy and efficiency. △ Less

Submitted 6 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05242 [pdf, other]

Collision-Free Trajectory Optimization in Cluttered Environments with Sums-of-Squares Programming

Authors: Yulin Li, Chunxin Zheng, Kai Chen, Yusen Xie, Xindong Tang, Michael Yu Wang, Jun Ma

Abstract: In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence o… ▽ More In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence of free regions, and allocate each waypoint on the trajectory to a specific region. Then, we incorporate a uniform scaling factor for each free region, and formulate a Sums-of-Squares (SOS) optimization problem that renders the containment relationship between the robot and the free space computationally tractable. The SOS optimization problem is further reformulated to a semidefinite program (SDP), and the collision-free constraints are shown to be equivalent to limiting the scaling factor along the entire trajectory. In this context, the robot at a specific configuration is tailored to stay within the free region. Next, to solve the trajectory optimization problem with the proposed safety constraints (which are implicitly dependent on the robot configurations), we derive the analytical solution to the gradient of the minimum scaling factor with respect to the robot configuration. As a result, this seamlessly facilitates the use of gradient-based methods in efficient solving of the trajectory optimization problem. Through a series of simulations and real-world experiments, the proposed trajectory optimization approach is validated in various challenging scenarios, and the results demonstrate its effectiveness in generating collision-free trajectories in dense and intricate environments populated with obstacles. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04922 [pdf, other]

Efficient Learnable Collaborative Attention for Single Image Super-Resolution

Authors: Yigang Zhao Chaowei Zheng, Jiannan Su, GuangyongChen, MinGan

Abstract: Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of featur… ▽ More Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of features. To address these challenges, we propose a novel Learnable Collaborative Attention (LCoA) that introduces inductive bias into non-local modeling. Our LCoA consists of two components: Learnable Sparse Pattern (LSP) and Collaborative Attention (CoA). LSP uses the k-means clustering algorithm to dynamically adjust the sparse attention pattern of deep features, which reduces the number of non-local modeling rounds compared with existing sparse solutions. CoA leverages the sparse attention pattern and weights learned by LSP, and co-optimizes the similarity matrix across different abstraction levels, which avoids redundant similarity matrix calculations. The experimental results show that our LCoA can reduce the non-local modeling time by about 83% in the inference stage. In addition, we integrate our LCoA into a deep Learnable Collaborative Attention Network (LCoAN), which achieves competitive performance in terms of inference time, memory consumption, and reconstruction quality compared with other state-of-the-art SR methods. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04835 [pdf, other]

A born ultramassive white dwarf-hot subdwarf super-Chandrasekhar candidate

Authors: Changqing Luo, Jiao Li, Chuanjie Zheng, Dongdong Liu, Zhenwei Li, Yangping Luo, Peter Nemeth, Bo Zhang, Jianping Xiong, Bo Wang, Song Wang, Yu Bai, Qingzheng Li, Pei Wang, Zhanwen Han, Jifeng Liu, Yang Huang, Xuefei Chen, Chao Liu

Abstract: Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron s… ▽ More Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron stars, especially for those unusual systems, which are hardly produced by core-collapse supernovae. However, the observational evidences for this theoretical predicted event and its progenitor are all very limited. In all of the known progenitors, white dwarfs increase in mass by accretion. Here, we report the discovery of an intriguing binary system Lan 11, consisted of a stripped core-helium-burning hot subdwarf and an unseen compact object of 1.08 to 1.35 $M_{\odot}$. Our binary population synthesis calculations, along with the absence of detection from the deep radio observations of the Five-hundred-meter Aperture Spherical Radio Telescope, strongly suggest that the latter is an ONe white dwarf. The total mass of this binary is 1.67 to 1.92 $M_{\odot}$}, significantly excessing the Chandrasekhar limit. The reproduction of its evolutionary history indicates that the unique system has undergone two phases of common envelope ejections, implying a born nature of this massive ONe white dwarf rather than an accretion growth from its companion. These results, together with short orbital period of this binary (3.65 hours), suggest that this system will merge in 500-540 Myr, largely triggering an AIC event, although the possibility of type Ia supernova cannot be fully ruled out. This finding greatly provides valuable constraints on our understanding of stellar endpoints, whatever leading to an AIC or a supernova. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 25 pages, 14 figures

arXiv:2404.03229 [pdf, other]

Relation between the keV-MeV and TeV emission of GRB 221009A and its implications

Authors: Yan-Qiu Zhang, Hao-Xiang Lin, Shao-Lin Xiong, Zhuo Li, Ming-Yu Ge, Chen-Wei Wang, Shu-Xu Yi, Zhen Zhang, Shuang-Nan Zhang, Li-Ming Song, Chao Zheng, Wang-Chen Xue, Jia-Cong Liu, Wen-Jun Tan, Yue Wang, Wen-Long Zhang

Abstract: Gamma-ray bursts (GRBs) are believed to launch relativistic jets, which generate prompt emission by their internal processes and drive external shocks into surrounding medium, accounting for the long-lasting afterglow emission. However, how the jet powers the external shock is an open question. The unprecedented observations of the keV-MeV emission with GECAM and the TeV emission with LHAASO of so… ▽ More Gamma-ray bursts (GRBs) are believed to launch relativistic jets, which generate prompt emission by their internal processes and drive external shocks into surrounding medium, accounting for the long-lasting afterglow emission. However, how the jet powers the external shock is an open question. The unprecedented observations of the keV-MeV emission with GECAM and the TeV emission with LHAASO of so far the brightest burst, GRB 221009A, offer a great opportunity to study the prompt-to-afterglow transition and the early dynamical evolution of the external shock. In this letter, we find that the cumulative light curve of keV-MeV emission could well fit the rising stage of the TeV light curve of GRB 221009A, with a time delay of $4.45^{+0.26}_{-0.26}$\,s for TeV emission. Moreover, both the rapid increase in the initial stage and the excess from about \T+260\,s to 270\,s in the TeV light curve could be interpreted by inverse Compton (IC) scatterings of the inner-coming photons by the energetic electrons in external shock. Our results not only reveal a close relation between the keV-MeV and TeV emission, but also indicate a continuous, rather than impulsive, energy injection to the external shock. Assuming an energy injection rate proportional to the keV-MeV flux, we build a continuous energy injection model which well fits the TeV light curve of GRB 221009A, and provides an estimate of the Lorentz factor of the jet. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02145 [pdf, other]

Iterated Learning Improves Compositionality in Large Vision-Language Models

Authors: Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna

Abstract: A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man… ▽ More A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man in black" and "a girl in black facing a man in white". Moreover, prior work suggests that compositionality doesn't arise with scale: larger model sizes or training data don't help. This paper develops a new iterated training algorithm that incentivizes compositionality. We draw on decades of cognitive science research that identifies cultural transmission-the need to teach a new generation-as a necessary inductive prior that incentivizes humans to develop compositional languages. Specifically, we reframe vision-language contrastive learning as the Lewis Signaling Game between a vision agent and a language agent, and operationalize cultural transmission by iteratively resetting one of the agent's weights during training. After every iteration, this training paradigm induces representations that become "easier to learn", a property of compositional languages: e.g. our model trained on CC3M and CC12M improves standard CLIP by 4.7%, 4.0% respectfully in the SugarCrepe benchmark. △ Less

Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.00889 [pdf, other]

Transport of the magnetic flux away from a decaying sunspot via convective motions

Authors: Chenxi Zheng, Thierry Roudier, Brigitte Schmieder, Guiping Ruan, Jean-Marie Malherbe, Yang Liu, Yao Chen, Wenda Cao

Abstract: Aims. The aim of this paper is to consider relationship between the decay of sunspots and convection via the motion of the family of granules and how the diffusion mechanism of magnetic field operates in a decaying sunspot. Methods. We report the decay of a sunspot observed by the 1.6m Goode Solar Telescope (GST) with the TiO Broadband Filter Imager (BFI) and the Near-InfraRed Imaging Spectropolar… ▽ More Aims. The aim of this paper is to consider relationship between the decay of sunspots and convection via the motion of the family of granules and how the diffusion mechanism of magnetic field operates in a decaying sunspot. Methods. We report the decay of a sunspot observed by the 1.6m Goode Solar Telescope (GST) with the TiO Broadband Filter Imager (BFI) and the Near-InfraRed Imaging Spectropolarimeter (NIRIS). The analysis was aided by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamic Observatory (SDO). In the first step, we followed the decay of the sunspot with HMI data over three days by constructing its evolving area and total magnetic flux. In the second step, the high spatial and temporal resolution of the GST instruments allowed us to analyze the causes of the decay of the sunspot. Afterward, we followed the emergence of granules in the moat region around the sunspot over six hours. The evolution of the trees of fragmenting granules (TFGs) was derived based on their relationship with the horizontal surface flows. Results. We find that the area and total magnetic flux display an exponential decrease over the course of the sunspot decay. We identified 22 moving magnetic features (MMFs) in the moats of pores, which is a signature of sunspot decay through diffusion. We note that the MMFs were constrained to follow the borders of TFGs during their journey away from the sunspot. Conclusions. The TFGs and their development contribute to the diffusion of the magnetic field outside the sunspot. The conclusion of our analysis shows the important role of the TFGs in sunspot decay. Finally, the the family of granules evacuates the magnetic field. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.19586 [pdf, other]

TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering

Authors: Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu

Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status… ▽ More Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code will be publicly available. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18296 [pdf, other]

GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm

Authors: Chunhang Zheng, Kechao Cai

Abstract: Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise… ▽ More Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation. △ Less

Submitted 14 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.16812 [pdf, other]

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16071 [pdf, other]

Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization

Authors: Linzhi Wu, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Tiejun Liu, Liang Xie, Ye Yan, Erwei Yin

Abstract: Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading s… ▽ More Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading system may perform poorly when handling a brand new speaker. To learn a speaker-robust lip reading model, a key insight is to reduce visual variations across speakers, avoiding the model overfitting to specific speakers. In this work, in view of both input visual clues and latent representations based on a hybrid CTC/attention architecture, we propose to exploit the lip landmark-guided fine-grained visual clues instead of frequently-used mouth-cropped images as input features, diminishing speaker-specific appearance characteristics. Furthermore, a max-min mutual information regularization approach is proposed to capture speaker-insensitive latent representations. Experimental evaluations on public lip reading datasets demonstrate the effectiveness of the proposed approach under the intra-speaker and inter-speaker conditions. △ Less

Submitted 2 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: To appear in LREC-COLING 2024

Journal ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

arXiv:2403.15382 [pdf, other]

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

Abstract: We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion… ▽ More We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion model, not restricted to a specific kinematic structure or object category. To this end, we start from a pre-trained image generator and fine-tune it on a new synthetic dataset, Drag-a-Move, which we introduce. Combined with a new encoding for the drags and dataset randomization, the new model generalizes well to real images and different categories. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Project page: https://dragapart.github.io/

arXiv:2403.14972 [pdf, other]

A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss… ▽ More This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Work in progress

arXiv:2403.14627 [pdf, other]

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Authors: Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

Abstract: We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primiti… ▽ More We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses $10\times$ fewer parameters and infers more than $2\times$ faster while providing higher appearance and geometry quality as well as better cross-dataset generalization. △ Less

Submitted 18 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: ECCV2024, Project page: https://donydchen.github.io/mvsplat, Code: https://github.com/donydchen/mvsplat

arXiv:2403.14619 [pdf, other]

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu

Abstract: 3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preven… ▽ More 3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preventing the segmentation network from learning the geometric details of the objects directly through radiance and density. In this paper, we propose ClusteringSDF, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDF no longer requires the ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, but purely with the noisy and inconsistent labels from pre-trained models.As the core of ClusteringSDF, we introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared against the state-of-the-art with significantly reduced training time. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Project Page: https://sm0kywu.github.io/ClusteringSDF/

arXiv:2403.12851 [pdf, other]

doi 10.1007/s11433-023-2381-0

Observation of spectral lines in the exceptional GRB 221009A

Authors: Yan-Qiu Zhang, Shao-Lin Xiong, Ji-Rong Mao, Shuang-Nan Zhang, Wang-Chen Xue, Chao Zheng, Jia-Cong Liu, Zhen Zhang, Xi-Lu Wang, Ming-Yu Ge, Shu-Xu Yi, Li-Ming Song, Zheng-Hua An, Ce Cai, Xin-Qiao Li, Wen-Xi Peng, Wen-Jun Tan, Chen-Wei Wang, Xiang-Yang Wen, Yue Wang, Shuo Xiao, Fan Zhang, Peng Zhang, Shi-Jie Zheng

Abstract: As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part wher… ▽ More As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part where many instruments suffered problems, and identified prominent emission lines in multiple time intervals. The central energy of the Gaussian emission line evolves from about 37 MeV to 6 MeV, with a nearly constant ratio (about 10\%) between the line width and central energy. Particularly, we find that both the central energy and the energy flux of the emission line evolve with time as a power law decay with power law index of -1 and -2 respectively. We suggest that the observed emission lines most likely originate from the blue-shifted electron positron pair annihilation 511 keV line. We find that a standard high latitude emission scenario cannot fully interpret the observation, thus we propose that the emission line comes from some dense clumps with electron positron pairs traveling together with the jet. In this scenario, we can use the emission line to directly, for the first time, measure the bulk Lorentz factor of the jet ($Γ$) and reveal its time evolution (i.e. $Γ\sim t^{-1}$) during the prompt emission. Interestingly, we find that the flux of the annihilation line in the co-moving frame keeps constant. These discoveries of the spectral line features shed new and important lights on the physics of GRB and relativistic jet. △ Less

Submitted 28 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy (SCPMA)

Journal ref: Observation of spectral lines in the exceptional GRB 221009A. Sci. China-Phys. Mech. Astron. 67, 289511 (2024)

arXiv:2403.12327 [pdf, other]

GT-Rain Single Image Deraining Challenge Report

Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11449 [pdf, other]

Graph Partial Label Learning with Potential Cause Discovering

Authors: Hang Gao, Jiaguo Yuan, Jiangmeng Li, Peng Qiao, Fengge Wu, Changwen Zheng, Huaping Liu

Abstract: Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have int… ▽ More Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have introduced Partial Label Learning (PLL) into graph representation learning. PLL is a critical weakly supervised learning problem where each training instance is associated with a set of candidate labels, including the ground-truth label and the additional interfering labels. PLL allows annotators to make errors, which reduces the difficulty of data labeling. Subsequently, we propose a novel graph representation learning method that enables GNN models to effectively learn discriminative information within the context of PLL. Our approach utilizes potential cause extraction to obtain graph data that holds causal relationships with the labels. By conducting auxiliary training based on the extracted graph data, our model can effectively eliminate the interfering information in the PLL scenario. We support the rationale behind our method with a series of theoretical analyses. Moreover, we conduct extensive evaluations and ablation studies on multiple datasets, demonstrating the superiority of our proposed method. △ Less

Submitted 21 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11310 [pdf, other]

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Authors: Qucheng Peng, Ce Zheng, Chen Chen

Abstract: 3D human pose data collected in controlled laboratory settings present challenges for pose estimators that generalize across diverse scenarios. To address this, domain generalization is employed. Current methodologies in domain generalization for 3D human pose estimation typically utilize adversarial training to generate synthetic poses for training. Nonetheless, these approaches exhibit several l… ▽ More 3D human pose data collected in controlled laboratory settings present challenges for pose estimators that generalize across diverse scenarios. To address this, domain generalization is employed. Current methodologies in domain generalization for 3D human pose estimation typically utilize adversarial training to generate synthetic poses for training. Nonetheless, these approaches exhibit several limitations. First, the lack of prior information about the target domain complicates the application of suitable augmentation through a single pose augmentor, affecting generalization on target domains. Moreover, adversarial training's discriminator tends to enforce similarity between source and synthesized poses, impeding the exploration of out-of-source distributions. Furthermore, the pose estimator's optimization is not exposed to domain shifts, limiting its overall generalization ability. To address these limitations, we propose a novel framework featuring two pose augmentors: the weak and the strong augmentors. Our framework employs differential strategies for generation and discrimination processes, facilitating the preservation of knowledge related to source poses and the exploration of out-of-source distributions without prior information about target poses. Besides, we leverage meta-optimization to simulate domain shifts in the optimization process of the pose estimator, thereby improving its generalization ability. Our proposed approach significantly outperforms existing methods, as demonstrated through comprehensive experiments on various benchmark datasets.Our code will be released at \url{https://github.com/davidpengucf/DAF-DG}. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.02635 [pdf, other]

PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning

Authors: Ke Zhang, DanDan Zhu, Qiuhan Xu, Hao Zhou, Ce Zheng

Abstract: Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical issue in multi-agent reinforcement learning is how to efficiently accelerate training process. To address this problem, current research has leveraged a centrali… ▽ More Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical issue in multi-agent reinforcement learning is how to efficiently accelerate training process. To address this problem, current research has leveraged a centralized function(CF) across multiple agents to learn contribution of the team reward for each agent. However, CF based methods introduce joint error from other agents in estimation of value network. In so doing, inspired by federated learning, we propose three simple novel approaches called Average Periodically Parameter Sharing(A-PPS), Reward-Scalability Periodically Parameter Sharing(RS-PPS) and Partial Personalized Periodically Parameter Sharing(PP-PPS) mechanism to accelerate training of MARL. Agents share Q-value network periodically during the training process. Agents which has same identity adapt collected reward as scalability and update partial neural network during period to share different parameters. We apply our approaches in classical MARL method QMIX and evaluate our approaches on various tasks in StarCraft Multi-Agent Challenge(SMAC) environment. Performance of numerical experiments yield enormous enhancement, with an average improvement of 10\%-30\%, and enable to win tasks that QMIX cannot. Our code can be downloaded from https://github.com/ColaZhang22/PPS-QMIX △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 10 pages, 5 figures

arXiv:2403.02513 [pdf, other]

Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF

Authors: Chen Zheng, Ke Sun, Hang Wu, Chenguang Xi, Xun Zhou

Abstract: In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user pr… ▽ More In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user preferences, inadvertently increasing the generation of toxic outputs when specifically prompted. To overcome these challenges, we adopted an innovative approach by completely bypassing SFT and directly implementing Harmless Reinforcement Learning from Human Feedback (RLHF). Our method not only preserves the base model's general capabilities but also significantly enhances its conversational abilities, while notably reducing the generation of toxic outputs. Our approach holds significant implications for fields that demand a nuanced understanding and generation of responses, such as customer service. We applied this methodology to Mistral, the most popular base model, thereby creating Mistral-Plus. Our validation across 11 general tasks demonstrates that Mistral-Plus outperforms similarly sized open-source base models and their corresponding instruct versions. Importantly, the conversational abilities of Mistral-Plus were significantly improved, indicating a substantial advancement over traditional SFT models in both safety and user preference alignment. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19143 [pdf, ps, other]

Recurrence Theorem for Open Quantum Systems

Authors: Zhihang Liu, Chao Zheng

Abstract: Quantum (Poincaré) recurrence theorem are known for closed quantum (classical) systems. Can recurrence happen in open systems? We provide the recurrence theorem for open quantum systems via non-Hermitian (NH) description. We find that PT symmetry and pseudo-Hermitian symmetry protect recurrence for NH open quantum systems and the recurrence fails with the symmetry breaking. Applying our theorem… ▽ More Quantum (Poincaré) recurrence theorem are known for closed quantum (classical) systems. Can recurrence happen in open systems? We provide the recurrence theorem for open quantum systems via non-Hermitian (NH) description. We find that PT symmetry and pseudo-Hermitian symmetry protect recurrence for NH open quantum systems and the recurrence fails with the symmetry breaking. Applying our theorem to PT-symmetric systems, we reveal why quantum recurrence happens in PT-unbroken phase but fails in PT-broken phase, which was misunderstood before. A contradiction emerges when we apply our theorem to anti-PT symmetric systems and we settle it, revealing that distinguishability and von Neumann entropy are generally not effective to describe the information dynamics in NH systems. A new approach is developed to investigate the information dynamics of NH systems. For anti-PT symmetric systems in PT-broken phase, we find there are three information-dynamics patterns: oscillations with an overall decrease (increase) , and periodic oscillations. The periodic oscillations (information complete retrieval) happen only if the spectrum of NH Hamiltonian is real. The three patterns degenerate to the periodic oscillation using distinguishability or von Neumann entropy because normalization of non-unitary evolved states leads to loss of information. We conclude with a discussion of the physical meaning behind the recurrence in open systems and give the direction of recurrence theorem not limited to conservative systems in classical mechanics. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17384 [pdf, other]

Ultraviolet and Chromospheric activity and Habitability of M stars

Authors: Xue Li, Song Wang, Henggeng Han, Huiqin Yang, Chuanjie Zheng, Yang Huang, Jifeng Liu

Abstract: M-type stars are crucial for stellar activity studies since they cover two types of magnetic dynamos and particularly intriguing for habitability studies due to their abundance and long lifespans during the main-sequence stage. In this paper, we used the LAMOST DR9 catalog and the GALEX UV archive data to investigate the chromospheric and UV activities of M-type stars. All the chromospheric and UV… ▽ More M-type stars are crucial for stellar activity studies since they cover two types of magnetic dynamos and particularly intriguing for habitability studies due to their abundance and long lifespans during the main-sequence stage. In this paper, we used the LAMOST DR9 catalog and the GALEX UV archive data to investigate the chromospheric and UV activities of M-type stars. All the chromospheric and UV activity indices clearly show the saturated and unsaturated regimes and the well-known activity-rotation relation, consistent with previous studies. Both the FUV and NUV activity indices exhibit a single-peaked distribution, while the {\rm H$α$} and \rm {Ca \scriptsize{\uppercase\expandafter{\romannumeral2}} \normalsize H$\&$K} indices show a distinct double-peaked distribution. The gap between these peaks suggests a rapid transition from a saturated population to an unsaturated one. The smoothly varying distributions of different subtypes suggest a rotation-dependent dynamo for both early-type (partly convective) to late-type (fully convective) M stars. We identified a group of stars with high UV activity above the saturation regime (log$R^{\prime}_{\rm NUV} > -2.5$) but low chromospheric activity, and the underlying reason is unknown. By calculating the continuously habitable zone and the UV habitable zone for each star, we found about 70\% stars in the total sample and 40\% stars within 100 pc are located in the overlapping region of these two habitable zones, indicating a number of M stars are potentially habitable. Finally, we examined the possibility of UV activity studies of M stars using the China Space Station Telescope. △ Less

Submitted 27 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 27 pages, 32 figures, accepted by ApJ

arXiv:2402.13572 [pdf, other]

On the Expressive Power of a Variant of the Looped Transformer

Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu

Abstract: Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by t… ▽ More Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer block, dubbed Algorithm Transformer (abbreviated as AlgoFormer). Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can achieve significantly higher expressiveness in algorithm representation when using the same number of parameters. In particular, inspired by the structure of human-designed learning algorithms, our transformer block consists of a pre-transformer that is responsible for task pre-processing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some challenging tasks. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.18018 [pdf, other]

On Prompt-Driven Safeguarding for Large Language Models

Authors: Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refu… ▽ More Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation. We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction, in which models become more prone to refusing to provide assistance, even when the queries are harmless. On the other hand, LLMs are naturally capable of distinguishing harmful and harmless queries without safety prompts. Inspired by these findings, we propose a method for safety prompt optimization, namely DRO (Directed Representation Optimization). Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness. Experiments with eight LLMs on out-of-domain and jailbreak benchmarks demonstrate that DRO remarkably improves the safeguarding performance of human-crafted safety prompts, without compromising the models' general performance. △ Less

Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: ICML 2024

arXiv:2401.17268 [pdf, other]

Weaver: Foundation Models for Creative Writing

Authors: Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang , et al. (21 additional authors not shown)

Abstract: This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for… ▽ More This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.14915 [pdf, other]

Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students

Authors: Chengbo Zheng, Kangyu Yuan, Bingcan Guo, Reza Hadi Mogavi, Zhenhui Peng, Shuai Ma, Xiaojuan Ma

Abstract: The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative worl… ▽ More The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative world where they could freely employ AI in PBL while needing to report this process to assess their skills and contributions. Our workshops yielded various scenarios of students' use of AI in PBL and ways of analyzing these uses grounded by students' vision of education goal transformation. We also found students with different attitudes toward AI exhibited distinct preferences in how to analyze and understand the use of AI. Based on these findings, we discuss future research opportunities on student-AI interactions and understanding AI-enhanced learning. △ Less

Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Conditionally accepted by CHI '24

arXiv:2401.14857 [pdf, other]

LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering

Authors: Sheng Hong, Junjie He, Xinhu Zheng, Chunran Zheng, Shaojie Shen

Abstract: We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteri… ▽ More We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initialization for the scene's surface Gaussians and the sensor's poses of each frame are obtained using a LiDAR-inertial system with the feature of size-adaptive voxels. Then, we optimized and refined the Gaussians using visual-derived photometric gradients to optimize their quality and density. Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. Bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality, while also holding potential applicability in real-time SLAM and robotics domains. We release our software and hardware and self-collected datasets to benefit the community. △ Less

Submitted 16 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14166 [pdf, other]

BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction

Authors: Jiangmeng Li, Fei Song, Yifan Jin, Wenwen Qiang, Changwen Zheng, Fuchun Sun, Hui Xiong

Abstract: As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the… ▽ More As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the perspective of distribution analyses, we disclose that the intrinsic issues behind the phenomenon are the over-multitudinous conceptual knowledge contained in PLMs and the abridged knowledge for target downstream domains, which jointly result in that PLMs mis-locate the knowledge distributions corresponding to the target domains in the universal knowledge embedding space. To this end, we intuitively explore to approximate the unabridged target domains of downstream tasks in a debiased manner, and then abstract such domains to generate discriminative prompts, thereby providing the de-ambiguous guidance for PLMs. Guided by such an intuition, we propose a simple yet effective approach, namely BayesPrompt, to learn prompts that contain the domain discriminative information against the interference from domain-irrelevant knowledge. BayesPrompt primitively leverages known distributions to approximate the debiased factual distributions of target domains and further uniformly samples certain representative features from the approximated distributions to generate the ultimate prompts for PLMs. We provide theoretical insights with the connection to domain adaptation. Empirically, our method achieves state-of-the-art performance on benchmarks. △ Less

Submitted 20 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted by ICLR2024

arXiv:2401.12785 [pdf, other]

Extended imaginary gauge transformation in a general nonreciprocal lattice

Authors: Yunyao Qi, Jinghui Pi, Yuquan Wu, Heng Lin, Chao Zheng, Guilu Long

Abstract: Imaginary gauge transformation (IGT) provides a clear understanding of the non-Hermitian skin effect by transforming the non-Hermitian Hamiltonians with real spectra into Hermitian ones. In this work, we extend this approach to the complex spectrum regime in a general nonreciprocal lattice model. We unveil the validity of IGT hinges on a class of pseudo-Hermitian symmetry. The generalized Brilloui… ▽ More Imaginary gauge transformation (IGT) provides a clear understanding of the non-Hermitian skin effect by transforming the non-Hermitian Hamiltonians with real spectra into Hermitian ones. In this work, we extend this approach to the complex spectrum regime in a general nonreciprocal lattice model. We unveil the validity of IGT hinges on a class of pseudo-Hermitian symmetry. The generalized Brillouin zone of Hamiltonian respect such pseudo-Hermiticity is demonstrated to be a circle, which enables easy access to the continuum bands, localization length of skin modes, and relevant topological numbers. Furthermore, we investigate the applicability of IGT and the underlying pseudo-Hermiticity beyond nearest-neighbour hopping, offering a graphical interpretation. Our theoretical framework is applied to establish bulk-boundary correspondence in the nonreciprocal trimer Su-Schrieffer-Heeger model and analyze the localization behaviors of skin modes in the two-dimensional Hatano-Nelson model. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 16 pages, 6 figures

arXiv:2401.10973 [pdf, other]

T2MAC: Targeted and Trusted Multi-Agent Communication through Selective Engagement and Evidence-Driven Integration

Authors: Chuxiong Sun, Zehua Zang, Jiabao Li, Jiangmeng Li, Xiao Xu, Rui Wang, Changwen Zheng

Abstract: Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to int… ▽ More Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to integrate observed and received information, impairing the learning process. To tackle these difficulties, we propose Targeted and Trusted Multi-Agent Communication (T2MAC), a straightforward yet effective method that enables agents to learn selective engagement and evidence-driven integration. With T2MAC, agents have the capability to craft individualized messages, pinpoint ideal communication windows, and engage with reliable partners, thereby refining communication efficiency. Following the reception of messages, the agents integrate information observed and received from different sources at an evidence level. This process enables agents to collectively use evidence garnered from multiple perspectives, fostering trusted and cooperative behaviors. We evaluate our method on a diverse set of cooperative multi-agent tasks, with varying difficulties, involving different scales and ranging from Hallway, MPE to SMAC. The experiments indicate that the proposed model not only surpasses the state-of-the-art methods in terms of cooperative performance and communication efficiency, but also exhibits impressive generalization. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: AAAI24

arXiv:2401.08621 [pdf, other]

Algebraic structure of the Gaussian-PDMF space and applications on fuzzy equations

Authors: Chuang Zheng

Abstract: In this paper, we extend the research presented in [Wang and Zheng, Fuzzy Sets and Systems, p108581, 2023] by establishing the algebraic structure of the Gaussian Probability Density Membership Function (Gaussian-PDMF) space. We consider fixed objective and subjective entities, denoted as $(h,p)$, and provide the explicit form of the membership function. Consequently, every fuzzy number with the m… ▽ More In this paper, we extend the research presented in [Wang and Zheng, Fuzzy Sets and Systems, p108581, 2023] by establishing the algebraic structure of the Gaussian Probability Density Membership Function (Gaussian-PDMF) space. We consider fixed objective and subjective entities, denoted as $(h,p)$, and provide the explicit form of the membership function. Consequently, every fuzzy number with the membership function in $X_{h,p}(\mathbb{R})$, denoted as $\tilde{x}$, can be uniquely identified by a vector $\langle x; d^-, d^+, μ^-,μ^+\rangle$. Here, $x\in \mathbb{R}$ represents the "leading factor" of the fuzzy number $\tilde{x}$ with a membership degree equal to $1$. The parameters $d^-$ (left side) and $d^+$ (right side) denote the lengths of the compact support, while $μ^-$ (left side) and $μ^+$ (right side) represent the shapes. We introduce five operators: addition, subtraction, multiplication, scalar multiplication, and division. We demonstrate that, based on our definitions, the Gaussian-PDMF space exhibits a well-defined algebraic structure. For instance, $X_{h,p}(\mathbb{R})$ is a vector space over $\mathbb{R}$, featuring a subspace that forms a division ring, allowing for the representation of fuzzy polynomials, among other properties. We provide several examples to illustrate our theoretical results. △ Less

Submitted 5 December, 2023; originally announced January 2024.

Comments: 23 pages, 5 figures

MSC Class: 03E72

arXiv:2401.07513 [pdf, other]

Detector performance of the Gamma-ray Transient Monitor onboard DRO-A Satellite

Authors: Pei-Yi Feng, Zheng-Hua An, Da-Li Zhang, Chen-Wei Wang, Chao Zheng, Sheng Yang, Shao-Lin Xiong, Jia-Cong Liu, Xin-Qiao Li, Ke Gong, Xiao-Jing Liu, Min Gao, Xiang-Yang Wen, Ya-Qing liu, Xiao-Yun Zhao, Fan Zhang, Xi-Lei Sun, Hong Lu

Abstract: Gamma-ray Transient Monitor (GTM) is an all-sky monitor onboard the Distant Retrograde Orbit-A (DRO-A) satellite with the scientific objective of detecting gamma-ray transients ranging from 20 keV to 1 MeV. GTM is equipped with 5 Gamma-ray Transient Probe (GTP) detector modules, utilizing the NaI(Tl) scintillator coupled with a SiPM array. To reduce the SiPM noise, GTP makes use of a dedicated dua… ▽ More Gamma-ray Transient Monitor (GTM) is an all-sky monitor onboard the Distant Retrograde Orbit-A (DRO-A) satellite with the scientific objective of detecting gamma-ray transients ranging from 20 keV to 1 MeV. GTM is equipped with 5 Gamma-ray Transient Probe (GTP) detector modules, utilizing the NaI(Tl) scintillator coupled with a SiPM array. To reduce the SiPM noise, GTP makes use of a dedicated dual-channel coincident readout design. In this work, we firstly studied the impact of different coincidence times on detection efficiency and ultimately selected the 500 ns time coincidence window for offline data processing. To test the performance of GTPs and validate the Monte Carlo simulated energy response, we conducted comprehensive ground calibration tests using Hard X-ray Calibration Facility (HXCF) and radioactive sources, including energy response, detection efficiency, spatial response, bias-voltage response, and temperature dependence. We extensively presented the ground calibration results, and validated the design and mass model of GTP detector. These work paved the road for the in-flight observation and science data analysis. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 13 pages, 25 figures

arXiv:2401.05062 [pdf, other]

Discrete conformal structures on surfaces with boundary (I) -- Classification

Authors: Xu Xu, Chao Zheng

Abstract: In this paper, we introduce the discrete conformal structures on surfaces with boundary in an axiomatic approach pioneered by Glickenstein \cite{Glickenstein}. This ensures that the Poincaré dual of an ideally triangulated surface with boundary has a good geometric structure. Then we classify the discrete conformal structures on surfaces with boundary, which turns out to unify and generalize Guo-L… ▽ More In this paper, we introduce the discrete conformal structures on surfaces with boundary in an axiomatic approach pioneered by Glickenstein \cite{Glickenstein}. This ensures that the Poincaré dual of an ideally triangulated surface with boundary has a good geometric structure. Then we classify the discrete conformal structures on surfaces with boundary, which turns out to unify and generalize Guo-Luo's generalized circle packings \cite{GL2}, Guo's vertex scalings \cite{Guo} and Xu's discrete conformal structures \cite{Xu22} on surfaces with boundary. The relationships between the discrete conformal structures on surfaces with boundary and the 3-dimensional hyperbolic geometry are also discussed. △ Less

Submitted 10 January, 2024; originally announced January 2024.

MSC Class: (2020): 52C25; 52C26

arXiv:2401.05056 [pdf, ps, other]

A discrete uniformization theorem for decorated piecewise hyperbolic metrics on surfaces

Authors: Xu Xu, Chao Zheng

Abstract: In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces. A discrete uniformization theorem is established for this discrete Gaussian curvature. We further investigate the prescribing combinatorial curvature problem for a parametrization of this discrete Gaussian curvature, which is called the combinatorial $α$-curvature. To find decorated piecewise hyperbolic… ▽ More In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces. A discrete uniformization theorem is established for this discrete Gaussian curvature. We further investigate the prescribing combinatorial curvature problem for a parametrization of this discrete Gaussian curvature, which is called the combinatorial $α$-curvature. To find decorated piecewise hyperbolic metrics with prescribed combinatorial $α$-curvatures, we introduce the combinatorial $α$-Ricci flow for decorated piecewise hyperbolic metrics. To handle the potential singularities along the combinatorial $α$-Ricci flow, we do surgery along the flow by edge flipping under the weighted Delaunay condition. Then we prove the longtime existence and convergence of the combinatorial $α$-Ricci flow with surgery. As an application of the combinatorial $α$-Ricci flow with surgery, we give the existence of decorated piecewise hyperbolic metrics with prescribed combinatorial $α$-curvatures. We further introduce the combinatorial $α$-Calabi flow with surgery and study its longtime behavior. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.02271

MSC Class: (2020): 52C26

Showing 51–100 of 758 results for author: Zheng, C