-
Locally Differentially Private In-Context Learning
Authors:
Chunyan Zheng,
Keke Sun,
Wenhao Zhao,
Haibo Zhou,
Lixin Jiang,
Shaoyang Song,
Chunlai Zhou
Abstract:
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference at…
▽ More
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Stellar X-ray activity and habitability revealed by ROSAT sky survey
Authors:
Henggeng Han,
Song Wang,
Chuanjie Zheng,
Xue Li,
Kai Xiao,
Jifeng Liu
Abstract:
Using the homogeneous X-ray catalog from ROSAT observations, we conducted a comprehensive investigation into stellar X-ray activity-rotation relations for both single and binary stars. Generally, the relation for single stars consists of two distinct regions: a weak decay region, indicating a continued dependence of the magnetic dynamo on stellar rotation rather than a saturation regime with const…
▽ More
Using the homogeneous X-ray catalog from ROSAT observations, we conducted a comprehensive investigation into stellar X-ray activity-rotation relations for both single and binary stars. Generally, the relation for single stars consists of two distinct regions: a weak decay region, indicating a continued dependence of the magnetic dynamo on stellar rotation rather than a saturation regime with constant activity, and a rapid decay region, where X-ray activity is strongly correlated with the Rossby number. Detailed analysis reveals more fine structures within the relation: in the extremely fast rotating regime, a decrease in X-ray activity was observed with increasing rotation rate, referred to as super-saturation, while in the extremely slow rotating region, the relation flattens, mainly due to the scattering of F stars. This scattering may result from intrinsic variability in stellar activities over one stellar cycle or the presence of different dynamo mechanisms. Binaries exhibit a similar relation to that of single stars while the limited sample size prevented the identification of fine structures in the relation for binaries. We calculated the mass loss rates of planetary atmosphere triggered by X-ray emissions from host stars. Our findings indicate that for an Earth-like planet within the stellar habitable zone, it would easily lose its entire primordial H/He envelope (equating to about 1% of the planetary mass).
△ Less
Submitted 20 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Heterogeneous network and graph attention auto-encoder for LncRNA-disease association prediction
Authors:
Jin-Xing Liu,
Wen-Yu Xi,
Ling-Yun Dai,
Chun-Hou Zheng,
Ying-Lian Gao
Abstract:
The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sou…
▽ More
The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sources of biomedical data are fully utilized to construct characteristics of lncRNAs and diseases, and linear and nonlinear characteristics are effectively integrated. Furthermore, a novel deep learning model based on graph attention automatic encoder is proposed, called HGATELDA. To begin with, the linear characteristics of lncRNAs and diseases are created by the miRNA-lncRNA interaction matrix and miRNA-disease interaction matrix. Following this, the nonlinear features of diseases and lncRNAs are extracted using a graph attention auto-encoder, which largely retains the critical information and effectively aggregates the neighborhood information of nodes. In the end, LDAs can be predicted by fusing the linear and nonlinear characteristics of diseases and lncRNA. The HGATELDA model achieves an impressive AUC value of 0.9692 when evaluated using a 5-fold cross-validation indicating its superior performance in comparison to several recent prediction models. Meanwhile, the effectiveness of HGATELDA in identifying novel LDAs is further demonstrated by case studies. the HGATELDA model appears to be a viable computational model for predicting LDAs.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Authors:
Piotr Padlewski,
Max Bain,
Matthew Henderson,
Zhongkai Zhu,
Nishant Relan,
Hai Pham,
Donovan Ong,
Kaloyan Aleksiev,
Aitor Ormazabal,
Samuel Phua,
Ethan Yeo,
Eugenie Lamprecht,
Qi Liu,
Yuqi Wang,
Eric Chen,
Deyu Fu,
Lei Li,
Che Zheng,
Cyprien de Masson d'Autume,
Dani Yogatama,
Mikel Artetxe,
Yi Tay
Abstract:
We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a…
▽ More
We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Explicitly Modeling Universality into Self-Supervised Learning
Authors:
Jingyao Wang,
Wenwen Qiang,
Zeen Song,
Lingyu Si,
Jiangmeng Li,
Changwen Zheng,
Bing Su
Abstract:
The goal of universality in self-supervised learning (SSL) is to learn universal representations from unlabeled data and achieve excellent performance on all samples and tasks. However, these methods lack explicit modeling of the universality in the learning objective, and the related theoretical understanding remains limited. This may cause models to overfit in data-scarce situations and generali…
▽ More
The goal of universality in self-supervised learning (SSL) is to learn universal representations from unlabeled data and achieve excellent performance on all samples and tasks. However, these methods lack explicit modeling of the universality in the learning objective, and the related theoretical understanding remains limited. This may cause models to overfit in data-scarce situations and generalize poorly in real life. To address these issues, we provide a theoretical definition of universality in SSL, which constrains both the learning and evaluation universality of the SSL models from the perspective of discriminability, transferability, and generalization. Then, we propose a $σ$-measurement to help quantify the score of one SSL model's universality. Based on the definition and measurement, we propose a general SSL framework, called GeSSL, to explicitly model universality into SSL. It introduces a self-motivated target based on $σ$-measurement, which enables the model to find the optimal update direction towards universality. Extensive theoretical and empirical evaluations demonstrate the superior performance of GeSSL.
△ Less
Submitted 23 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference
Authors:
Haoxuan Li,
Chunyuan Zheng,
Sihao Ding,
Peng Wu,
Zhi Geng,
Fuli Feng,
Xiangnan He
Abstract:
Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, name…
▽ More
Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, named neighborhood effect. To fill the gap, this paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference and introduces a treatment representation to capture the neighborhood effect. On this basis, we propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect. We further develop two new estimators for estimating the proposed ideal loss. We theoretically establish the connection between the proposed and previous debiasing methods ignoring the neighborhood effect, showing that the proposed methods can achieve unbiased learning when both selection bias and neighborhood effect are present, while the existing methods are biased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed methods.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Debiased Collaborative Filtering with Kernel-Based Causal Balancing
Authors:
Haoxuan Li,
Chunyuan Zheng,
Yanghao Xiao,
Peng Wu,
Zhi Geng,
Xu Chen,
Peng Cui
Abstract:
Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing…
▽ More
Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing constraints. However, existing methods usually ignore such constraints or implement them with unreasonable approximations, which may affect the accuracy of the learned propensity scores. To bridge this gap, in this paper, we first analyze the gaps between the causal balancing requirements and existing methods such as learning the propensity with cross-entropy loss or manually selecting functions to balance. Inspired by these gaps, we propose to approximate the balancing functions in reproducing kernel Hilbert space and demonstrate that, based on the universal property and representer theorem of kernel functions, the causal balancing constraints can be better satisfied. Meanwhile, we propose an algorithm that adaptively balances the kernel function and theoretically analyze the generalization error bound of our methods. We conduct extensive experiments to demonstrate the effectiveness of our methods, and to promote this research direction, we have released our project at https://github.com/haoxuanli-pku/ICLR24-Kernel-Balancing.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Weak-to-Strong Extrapolation Expedites Alignment
Authors:
Chujie Zheng,
Ziqi Wang,
Heng Ji,
Minlie Huang,
Nanyun Peng
Abstract:
The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite…
▽ More
The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the literature on model interpolation, we propose a simple method called ExPO to boost LLMs' alignment with human preference. Utilizing a model that has undergone alignment training (e.g., via DPO or RLHF) and its initial SFT checkpoint, ExPO directly obtains a better-aligned model by extrapolating from the weights of the initial and the aligned models, which implicitly optimizes the alignment objective via first-order approximation. Through experiments with twelve open-source LLMs on HuggingFace, we demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models, as evaluated on the mainstream LLM benchmarks AlpacaEval 2.0 and MT-Bench. Moreover, ExPO exhibits remarkable scalability across various model sizes (from 1.8B to 70B) and capabilities. Through controlled experiments and further empirical analyses, we shed light on the essence of ExPO amplifying the reward signal learned during alignment training. Our work demonstrates the efficacy of model extrapolation in expediting the alignment of LLMs with human preference, suggesting a promising direction for future research.
△ Less
Submitted 22 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
Jinshan Pan,
Jiangxin Dong,
Jinhui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi Jin,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Authors:
Tianyuan Zhang,
Hong-Xing Yu,
Rundi Wu,
Brandon Y. Feng,
Changxi Zheng,
Noah Snavely,
Jiajun Wu,
William T. Freeman
Abstract:
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these…
▽ More
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Authors:
Reka Team,
Aitor Ormazabal,
Che Zheng,
Cyprien de Masson d'Autume,
Dani Yogatama,
Deyu Fu,
Donovan Ong,
Eric Chen,
Eugenie Lamprecht,
Hai Pham,
Isaac Ong,
Kaloyan Aleksiev,
Lei Li,
Matthew Henderson,
Max Bain,
Mikel Artetxe,
Nishant Relan,
Piotr Padlewski,
Qi Liu,
Ren Chen,
Samuel Phua,
Yazheng Yang,
Yi Tay,
Yuqi Wang,
Zhongkai Zhu
, et al. (1 additional authors not shown)
Abstract:
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but al…
▽ More
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Meta-Auxiliary Learning for Micro-Expression Recognition
Authors:
Jingyao Wang,
Yunhan Tian,
Yuxuan Yang,
Xiaoxin Chen,
Changwen Zheng,
Wenwen Qiang
Abstract:
Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feat…
▽ More
Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Intriguing Properties of Positional Encoding in Time Series Forecasting
Authors:
Jianqi Zhang,
Jingyao Wang,
Wenwen Qiang,
Fanjiang Xu,
Changwen Zheng,
Fuchun Sun,
Hui Xiong
Abstract:
Transformer-based methods have made significant progress in time series forecasting (TSF). They primarily handle two types of tokens, i.e., temporal tokens that contain all variables of the same timestamp, and variable tokens that contain all input time points for a specific variable. Transformer-based methods rely on positional encoding (PE) to mark tokens' positions, facilitating the model to pe…
▽ More
Transformer-based methods have made significant progress in time series forecasting (TSF). They primarily handle two types of tokens, i.e., temporal tokens that contain all variables of the same timestamp, and variable tokens that contain all input time points for a specific variable. Transformer-based methods rely on positional encoding (PE) to mark tokens' positions, facilitating the model to perceive the correlation between tokens. However, in TSF, research on PE remains insufficient. To address this gap, we conduct experiments and uncover intriguing properties of existing PEs in TSF: (i) The positional information injected by PEs diminishes as the network depth increases; (ii) Enhancing positional information in deep networks is advantageous for improving the model's performance; (iii) PE based on the similarity between tokens can improve the model's performance. Motivated by these findings, we introduce two new PEs: Temporal Position Encoding (T-PE) for temporal tokens and Variable Positional Encoding (V-PE) for variable tokens. Both T-PE and V-PE incorporate geometric PE based on tokens' positions and semantic PE based on the similarity between tokens but using different calculations. To leverage both the PEs, we design a Transformer-based dual-branch framework named T2B-PE. It first calculates temporal tokens' correlation and variable tokens' correlation respectively and then fuses the dual-branch features through the gated unit. Extensive experiments demonstrate the superior robustness and effectiveness of T2B-PE. The code is available at: \href{https://github.com/jlu-phyComputer/T2B-PE}{https://github.com/jlu-phyComputer/T2B-PE}.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Searching for short-period variables in M31: method and catalogs
Authors:
Hongrui Gu,
Haibo Yuan,
Subo Dong,
Chenfa Zheng,
Shenzhe Cui,
Yi Ren,
Haozhu Fu,
Yang Huang,
Zhou Fan
Abstract:
Utilizing high-cadence and continuous g- and r-band data over three nights acquired from the 3.6-meter Canada France Hawaii Telescope (CFHT) aimed to find short-duration microlensing events, we conduct a systematic search for variables, transients, and asteroids across a $\sim1^\circ$ field of view of the Andromeda Galaxy (M 31). We present a catalog of 5859 variable stars, yielding the most exten…
▽ More
Utilizing high-cadence and continuous g- and r-band data over three nights acquired from the 3.6-meter Canada France Hawaii Telescope (CFHT) aimed to find short-duration microlensing events, we conduct a systematic search for variables, transients, and asteroids across a $\sim1^\circ$ field of view of the Andromeda Galaxy (M 31). We present a catalog of 5859 variable stars, yielding the most extensive compilation of short-period variable sources of M 31. We also detected 19 flares, predominantly associated with foreground M dwarfs in the Milky Way. In addition, we discovered 17 previously unknown asteroid candidates, and we subsequently reported them to the Minor Planet Center. Lastly, we report a microlensing event candidate C-ML-1 and present a preliminary analysis.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Quantum Network Tomography via Learning Isometries on Stiefel Manifold
Authors:
Ze-Tong Li,
Xin-Lin He,
Cong-Cong Zheng,
Yu-Qian Dong,
Tian Luan,
Xu-Tao Yu,
Zai-Chen Zhang
Abstract:
Explicit mathematical reconstructions of quantum networks play a significant role in developing quantum information science. However, tremendous parameter requirements and physical constraint implementations have become computationally non-ignorable encumbrances. In this work, we propose an efficient method for quantum network tomography by learning isometries on the Stiefel manifold. Tasks of rec…
▽ More
Explicit mathematical reconstructions of quantum networks play a significant role in developing quantum information science. However, tremendous parameter requirements and physical constraint implementations have become computationally non-ignorable encumbrances. In this work, we propose an efficient method for quantum network tomography by learning isometries on the Stiefel manifold. Tasks of reconstructing quantum networks are tackled by solving a series of unconstrained optimization problems with significantly less parameters. The stepwise isometry estimation shows the capability for providing information of the truncated quantum comb while processing the tomography. Remarkably, this method enables the compressive quantum comb tomography by specifying the dimensions of isometries. As a result, our proposed method exhibits high accuracy and efficiency.
△ Less
Submitted 6 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Collision-Free Trajectory Optimization in Cluttered Environments with Sums-of-Squares Programming
Authors:
Yulin Li,
Chunxin Zheng,
Kai Chen,
Yusen Xie,
Xindong Tang,
Michael Yu Wang,
Jun Ma
Abstract:
In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence o…
▽ More
In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence of free regions, and allocate each waypoint on the trajectory to a specific region. Then, we incorporate a uniform scaling factor for each free region, and formulate a Sums-of-Squares (SOS) optimization problem that renders the containment relationship between the robot and the free space computationally tractable. The SOS optimization problem is further reformulated to a semidefinite program (SDP), and the collision-free constraints are shown to be equivalent to limiting the scaling factor along the entire trajectory. In this context, the robot at a specific configuration is tailored to stay within the free region. Next, to solve the trajectory optimization problem with the proposed safety constraints (which are implicitly dependent on the robot configurations), we derive the analytical solution to the gradient of the minimum scaling factor with respect to the robot configuration. As a result, this seamlessly facilitates the use of gradient-based methods in efficient solving of the trajectory optimization problem. Through a series of simulations and real-world experiments, the proposed trajectory optimization approach is validated in various challenging scenarios, and the results demonstrate its effectiveness in generating collision-free trajectories in dense and intricate environments populated with obstacles.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Efficient Learnable Collaborative Attention for Single Image Super-Resolution
Authors:
Yigang Zhao Chaowei Zheng,
Jiannan Su,
GuangyongChen,
MinGan
Abstract:
Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of featur…
▽ More
Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of features. To address these challenges, we propose a novel Learnable Collaborative Attention (LCoA) that introduces inductive bias into non-local modeling. Our LCoA consists of two components: Learnable Sparse Pattern (LSP) and Collaborative Attention (CoA). LSP uses the k-means clustering algorithm to dynamically adjust the sparse attention pattern of deep features, which reduces the number of non-local modeling rounds compared with existing sparse solutions. CoA leverages the sparse attention pattern and weights learned by LSP, and co-optimizes the similarity matrix across different abstraction levels, which avoids redundant similarity matrix calculations. The experimental results show that our LCoA can reduce the non-local modeling time by about 83% in the inference stage. In addition, we integrate our LCoA into a deep Learnable Collaborative Attention Network (LCoAN), which achieves competitive performance in terms of inference time, memory consumption, and reconstruction quality compared with other state-of-the-art SR methods.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
A born ultramassive white dwarf-hot subdwarf super-Chandrasekhar candidate
Authors:
Changqing Luo,
Jiao Li,
Chuanjie Zheng,
Dongdong Liu,
Zhenwei Li,
Yangping Luo,
Peter Nemeth,
Bo Zhang,
Jianping Xiong,
Bo Wang,
Song Wang,
Yu Bai,
Qingzheng Li,
Pei Wang,
Zhanwen Han,
Jifeng Liu,
Yang Huang,
Xuefei Chen,
Chao Liu
Abstract:
Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron s…
▽ More
Although supernovae is a well-known endpoint of an accreting white dwarf, alternative theoretical possibilities has been discussing broadly, such as the accretion-induced collapse (AIC) event as the endpoint of oxygen-neon (ONe) white dwarfs, either accreting up to or merging to excess the Chandrasekhar limit (the maximum mass of a stable white dwarf). AIC is an important channel to form neutron stars, especially for those unusual systems, which are hardly produced by core-collapse supernovae. However, the observational evidences for this theoretical predicted event and its progenitor are all very limited. In all of the known progenitors, white dwarfs increase in mass by accretion. Here, we report the discovery of an intriguing binary system Lan 11, consisted of a stripped core-helium-burning hot subdwarf and an unseen compact object of 1.08 to 1.35 $M_{\odot}$. Our binary population synthesis calculations, along with the absence of detection from the deep radio observations of the Five-hundred-meter Aperture Spherical Radio Telescope, strongly suggest that the latter is an ONe white dwarf. The total mass of this binary is 1.67 to 1.92 $M_{\odot}$}, significantly excessing the Chandrasekhar limit. The reproduction of its evolutionary history indicates that the unique system has undergone two phases of common envelope ejections, implying a born nature of this massive ONe white dwarf rather than an accretion growth from its companion. These results, together with short orbital period of this binary (3.65 hours), suggest that this system will merge in 500-540 Myr, largely triggering an AIC event, although the possibility of type Ia supernova cannot be fully ruled out. This finding greatly provides valuable constraints on our understanding of stellar endpoints, whatever leading to an AIC or a supernova.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Relation between the keV-MeV and TeV emission of GRB 221009A and its implications
Authors:
Yan-Qiu Zhang,
Hao-Xiang Lin,
Shao-Lin Xiong,
Zhuo Li,
Ming-Yu Ge,
Chen-Wei Wang,
Shu-Xu Yi,
Zhen Zhang,
Shuang-Nan Zhang,
Li-Ming Song,
Chao Zheng,
Wang-Chen Xue,
Jia-Cong Liu,
Wen-Jun Tan,
Yue Wang,
Wen-Long Zhang
Abstract:
Gamma-ray bursts (GRBs) are believed to launch relativistic jets, which generate prompt emission by their internal processes and drive external shocks into surrounding medium, accounting for the long-lasting afterglow emission. However, how the jet powers the external shock is an open question. The unprecedented observations of the keV-MeV emission with GECAM and the TeV emission with LHAASO of so…
▽ More
Gamma-ray bursts (GRBs) are believed to launch relativistic jets, which generate prompt emission by their internal processes and drive external shocks into surrounding medium, accounting for the long-lasting afterglow emission. However, how the jet powers the external shock is an open question. The unprecedented observations of the keV-MeV emission with GECAM and the TeV emission with LHAASO of so far the brightest burst, GRB 221009A, offer a great opportunity to study the prompt-to-afterglow transition and the early dynamical evolution of the external shock. In this letter, we find that the cumulative light curve of keV-MeV emission could well fit the rising stage of the TeV light curve of GRB 221009A, with a time delay of $4.45^{+0.26}_{-0.26}$\,s for TeV emission. Moreover, both the rapid increase in the initial stage and the excess from about \T+260\,s to 270\,s in the TeV light curve could be interpreted by inverse Compton (IC) scatterings of the inner-coming photons by the energetic electrons in external shock. Our results not only reveal a close relation between the keV-MeV and TeV emission, but also indicate a continuous, rather than impulsive, energy injection to the external shock. Assuming an energy injection rate proportional to the keV-MeV flux, we build a continuous energy injection model which well fits the TeV light curve of GRB 221009A, and provides an estimate of the Lorentz factor of the jet.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Iterated Learning Improves Compositionality in Large Vision-Language Models
Authors:
Chenhao Zheng,
Jieyu Zhang,
Aniruddha Kembhavi,
Ranjay Krishna
Abstract:
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man…
▽ More
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man in black" and "a girl in black facing a man in white". Moreover, prior work suggests that compositionality doesn't arise with scale: larger model sizes or training data don't help. This paper develops a new iterated training algorithm that incentivizes compositionality. We draw on decades of cognitive science research that identifies cultural transmission-the need to teach a new generation-as a necessary inductive prior that incentivizes humans to develop compositional languages. Specifically, we reframe vision-language contrastive learning as the Lewis Signaling Game between a vision agent and a language agent, and operationalize cultural transmission by iteratively resetting one of the agent's weights during training. After every iteration, this training paradigm induces representations that become "easier to learn", a property of compositional languages: e.g. our model trained on CC3M and CC12M improves standard CLIP by 4.7%, 4.0% respectfully in the SugarCrepe benchmark.
△ Less
Submitted 16 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Transport of the magnetic flux away from a decaying sunspot via convective motions
Authors:
Chenxi Zheng,
Thierry Roudier,
Brigitte Schmieder,
Guiping Ruan,
Jean-Marie Malherbe,
Yang Liu,
Yao Chen,
Wenda Cao
Abstract:
Aims. The aim of this paper is to consider relationship between the decay of sunspots and convection via the motion of the family of granules and how the diffusion mechanism of magnetic field operates in a decaying sunspot. Methods. We report the decay of a sunspot observed by the 1.6m Goode Solar Telescope (GST) with the TiO Broadband Filter Imager (BFI) and the Near-InfraRed Imaging Spectropolar…
▽ More
Aims. The aim of this paper is to consider relationship between the decay of sunspots and convection via the motion of the family of granules and how the diffusion mechanism of magnetic field operates in a decaying sunspot. Methods. We report the decay of a sunspot observed by the 1.6m Goode Solar Telescope (GST) with the TiO Broadband Filter Imager (BFI) and the Near-InfraRed Imaging Spectropolarimeter (NIRIS). The analysis was aided by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamic Observatory (SDO). In the first step, we followed the decay of the sunspot with HMI data over three days by constructing its evolving area and total magnetic flux. In the second step, the high spatial and temporal resolution of the GST instruments allowed us to analyze the causes of the decay of the sunspot. Afterward, we followed the emergence of granules in the moat region around the sunspot over six hours. The evolution of the trees of fragmenting granules (TFGs) was derived based on their relationship with the horizontal surface flows. Results. We find that the area and total magnetic flux display an exponential decrease over the course of the sunspot decay. We identified 22 moving magnetic features (MMFs) in the moats of pores, which is a signature of sunspot decay through diffusion. We note that the MMFs were constrained to follow the borders of TFGs during their journey away from the sunspot. Conclusions. The TFGs and their development contribute to the diffusion of the magnetic field outside the sunspot. The conclusion of our analysis shows the important role of the TFGs in sunspot decay. Finally, the the family of granules evacuates the magnetic field.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors:
Shuai Zhang,
Huangxuan Zhao,
Zhenghong Zhou,
Guanjun Wu,
Chuansheng Zheng,
Xinggang Wang,
Wenyu Liu
Abstract:
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status…
▽ More
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code will be publicly available.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm
Authors:
Chunhang Zheng,
Kechao Cai
Abstract:
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise…
▽ More
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.
△ Less
Submitted 14 May, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Authors:
Shuai Ma,
Qiaoyi Chen,
Xinru Wang,
Chengbo Zheng,
Zhenhui Peng,
Ming Yin,
Xiaojuan Ma
Abstract:
In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p…
▽ More
In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Authors:
Linzhi Wu,
Xingyu Zhang,
Yakun Zhang,
Changyan Zheng,
Tiejun Liu,
Liang Xie,
Ye Yan,
Erwei Yin
Abstract:
Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading s…
▽ More
Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading system may perform poorly when handling a brand new speaker. To learn a speaker-robust lip reading model, a key insight is to reduce visual variations across speakers, avoiding the model overfitting to specific speakers. In this work, in view of both input visual clues and latent representations based on a hybrid CTC/attention architecture, we propose to exploit the lip landmark-guided fine-grained visual clues instead of frequently-used mouth-cropped images as input features, diminishing speaker-specific appearance characteristics. Furthermore, a max-min mutual information regularization approach is proposed to capture speaker-insensitive latent representations. Experimental evaluations on public lip reading datasets demonstrate the effectiveness of the proposed approach under the intra-speaker and inter-speaker conditions.
△ Less
Submitted 2 May, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Authors:
Ruining Li,
Chuanxia Zheng,
Christian Rupprecht,
Andrea Vedaldi
Abstract:
We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion…
▽ More
We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion model, not restricted to a specific kinematic structure or object category. To this end, we start from a pre-trained image generator and fine-tune it on a new synthetic dataset, Drag-a-Move, which we introduce. Combined with a new encoding for the drags and dataset randomization, the new model generalizes well to real images and different categories. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
Authors:
Changmeng Zheng,
Dayong Liang,
Wengyu Zhang,
Xiao-Yong Wei,
Tat-Seng Chua,
Qing Li
Abstract:
This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss…
▽ More
This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Authors:
Yuedong Chen,
Haofei Xu,
Chuanxia Zheng,
Bohan Zhuang,
Marc Pollefeys,
Andreas Geiger,
Tat-Jen Cham,
Jianfei Cai
Abstract:
We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primiti…
▽ More
We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses $10\times$ fewer parameters and infers more than $2\times$ faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.
△ Less
Submitted 18 July, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
Authors:
Tianhao Wu,
Chuanxia Zheng,
Tat-Jen Cham,
Qianyi Wu
Abstract:
3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preven…
▽ More
3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preventing the segmentation network from learning the geometric details of the objects directly through radiance and density. In this paper, we propose ClusteringSDF, a novel approach to achieve both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically Signal Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDF no longer requires the ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, but purely with the noisy and inconsistent labels from pre-trained models.As the core of ClusteringSDF, we introduce a high-efficient clustering mechanism for lifting the 2D labels to 3D and the experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared against the state-of-the-art with significantly reduced training time.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Observation of spectral lines in the exceptional GRB 221009A
Authors:
Yan-Qiu Zhang,
Shao-Lin Xiong,
Ji-Rong Mao,
Shuang-Nan Zhang,
Wang-Chen Xue,
Chao Zheng,
Jia-Cong Liu,
Zhen Zhang,
Xi-Lu Wang,
Ming-Yu Ge,
Shu-Xu Yi,
Li-Ming Song,
Zheng-Hua An,
Ce Cai,
Xin-Qiao Li,
Wen-Xi Peng,
Wen-Jun Tan,
Chen-Wei Wang,
Xiang-Yang Wen,
Yue Wang,
Shuo Xiao,
Fan Zhang,
Peng Zhang,
Shi-Jie Zheng
Abstract:
As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part wher…
▽ More
As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part where many instruments suffered problems, and identified prominent emission lines in multiple time intervals. The central energy of the Gaussian emission line evolves from about 37 MeV to 6 MeV, with a nearly constant ratio (about 10\%) between the line width and central energy. Particularly, we find that both the central energy and the energy flux of the emission line evolve with time as a power law decay with power law index of -1 and -2 respectively. We suggest that the observed emission lines most likely originate from the blue-shifted electron positron pair annihilation 511 keV line. We find that a standard high latitude emission scenario cannot fully interpret the observation, thus we propose that the emission line comes from some dense clumps with electron positron pairs traveling together with the jet. In this scenario, we can use the emission line to directly, for the first time, measure the bulk Lorentz factor of the jet ($Γ$) and reveal its time evolution (i.e. $Γ\sim t^{-1}$) during the prompt emission. Interestingly, we find that the flux of the annihilation line in the co-moving frame keeps constant. These discoveries of the spectral line features shed new and important lights on the physics of GRB and relativistic jet.
△ Less
Submitted 28 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
GT-Rain Single Image Deraining Challenge Report
Authors:
Howard Zhang,
Yunhao Ba,
Ethan Yang,
Rishi Upadhyay,
Alex Wong,
Achuta Kadambi,
Yun Guo,
Xueyao Xiao,
Xiaoxiong Wang,
Yi Li,
Yi Chang,
Luxin Yan,
Chaochao Zheng,
Luping Wang,
Bin Liu,
Sunder Ali Khowaja,
Jiseok Yoon,
Ik-Hyun Lee,
Zhao Zhang,
Yanyan Wei,
Jiahuan Ren,
Suiyi Zhao,
Huan Zheng
Abstract:
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o…
▽ More
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Graph Partial Label Learning with Potential Cause Discovering
Authors:
Hang Gao,
Jiaguo Yuan,
Jiangmeng Li,
Peng Qiao,
Fengge Wu,
Changwen Zheng,
Huaping Liu
Abstract:
Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have int…
▽ More
Graph Neural Networks (GNNs) have garnered widespread attention for their potential to address the challenges posed by graph representation learning, which face complex graph-structured data across various domains. However, due to the inherent complexity and interconnectedness of graphs, accurately annotating graph data for training GNNs is extremely challenging. To address this issue, we have introduced Partial Label Learning (PLL) into graph representation learning. PLL is a critical weakly supervised learning problem where each training instance is associated with a set of candidate labels, including the ground-truth label and the additional interfering labels. PLL allows annotators to make errors, which reduces the difficulty of data labeling. Subsequently, we propose a novel graph representation learning method that enables GNN models to effectively learn discriminative information within the context of PLL. Our approach utilizes potential cause extraction to obtain graph data that holds causal relationships with the labels. By conducting auxiliary training based on the extracted graph data, our model can effectively eliminate the interfering information in the PLL scenario. We support the rationale behind our method with a series of theoretical analyses. Moreover, we conduct extensive evaluations and ablation studies on multiple datasets, demonstrating the superiority of our proposed method.
△ Less
Submitted 21 May, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Authors:
Qucheng Peng,
Ce Zheng,
Chen Chen
Abstract:
3D human pose data collected in controlled laboratory settings present challenges for pose estimators that generalize across diverse scenarios. To address this, domain generalization is employed. Current methodologies in domain generalization for 3D human pose estimation typically utilize adversarial training to generate synthetic poses for training. Nonetheless, these approaches exhibit several l…
▽ More
3D human pose data collected in controlled laboratory settings present challenges for pose estimators that generalize across diverse scenarios. To address this, domain generalization is employed. Current methodologies in domain generalization for 3D human pose estimation typically utilize adversarial training to generate synthetic poses for training. Nonetheless, these approaches exhibit several limitations. First, the lack of prior information about the target domain complicates the application of suitable augmentation through a single pose augmentor, affecting generalization on target domains. Moreover, adversarial training's discriminator tends to enforce similarity between source and synthesized poses, impeding the exploration of out-of-source distributions. Furthermore, the pose estimator's optimization is not exposed to domain shifts, limiting its overall generalization ability.
To address these limitations, we propose a novel framework featuring two pose augmentors: the weak and the strong augmentors. Our framework employs differential strategies for generation and discrimination processes, facilitating the preservation of knowledge related to source poses and the exploration of out-of-source distributions without prior information about target poses. Besides, we leverage meta-optimization to simulate domain shifts in the optimization process of the pose estimator, thereby improving its generalization ability. Our proposed approach significantly outperforms existing methods, as demonstrated through comprehensive experiments on various benchmark datasets.Our code will be released at \url{https://github.com/davidpengucf/DAF-DG}.
△ Less
Submitted 19 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning
Authors:
Ke Zhang,
DanDan Zhu,
Qiuhan Xu,
Hao Zhou,
Ce Zheng
Abstract:
Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical issue in multi-agent reinforcement learning is how to efficiently accelerate training process. To address this problem, current research has leveraged a centrali…
▽ More
Training for multi-agent reinforcement learning(MARL) is a time-consuming process caused by distribution shift of each agent. One drawback is that strategy of each agent in MARL is independent but actually in cooperation. Thus, a vertical issue in multi-agent reinforcement learning is how to efficiently accelerate training process. To address this problem, current research has leveraged a centralized function(CF) across multiple agents to learn contribution of the team reward for each agent. However, CF based methods introduce joint error from other agents in estimation of value network. In so doing, inspired by federated learning, we propose three simple novel approaches called Average Periodically Parameter Sharing(A-PPS), Reward-Scalability Periodically Parameter Sharing(RS-PPS) and Partial Personalized Periodically Parameter Sharing(PP-PPS) mechanism to accelerate training of MARL. Agents share Q-value network periodically during the training process. Agents which has same identity adapt collected reward as scalability and update partial neural network during period to share different parameters. We apply our approaches in classical MARL method QMIX and evaluate our approaches on various tasks in StarCraft Multi-Agent Challenge(SMAC) environment. Performance of numerical experiments yield enormous enhancement, with an average improvement of 10\%-30\%, and enable to win tasks that QMIX cannot. Our code can be downloaded from https://github.com/ColaZhang22/PPS-QMIX
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Authors:
Chen Zheng,
Ke Sun,
Hang Wu,
Chenguang Xi,
Xun Zhou
Abstract:
In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user pr…
▽ More
In recent advancements in Conversational Large Language Models (LLMs), a concerning trend has emerged, showing that many new base LLMs experience a knowledge reduction in their foundational capabilities following Supervised Fine-Tuning (SFT). This process often leads to issues such as forgetting or a decrease in the base model's abilities. Moreover, fine-tuned models struggle to align with user preferences, inadvertently increasing the generation of toxic outputs when specifically prompted. To overcome these challenges, we adopted an innovative approach by completely bypassing SFT and directly implementing Harmless Reinforcement Learning from Human Feedback (RLHF). Our method not only preserves the base model's general capabilities but also significantly enhances its conversational abilities, while notably reducing the generation of toxic outputs. Our approach holds significant implications for fields that demand a nuanced understanding and generation of responses, such as customer service. We applied this methodology to Mistral, the most popular base model, thereby creating Mistral-Plus. Our validation across 11 general tasks demonstrates that Mistral-Plus outperforms similarly sized open-source base models and their corresponding instruct versions. Importantly, the conversational abilities of Mistral-Plus were significantly improved, indicating a substantial advancement over traditional SFT models in both safety and user preference alignment.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Recurrence Theorem for Open Quantum Systems
Authors:
Zhihang Liu,
Chao Zheng
Abstract:
Quantum (Poincaré) recurrence theorem are known for closed quantum (classical) systems. Can recurrence happen in open systems? We provide the recurrence theorem for open quantum systems via non-Hermitian (NH) description. We find that PT symmetry and pseudo-Hermitian symmetry protect recurrence for NH open quantum systems and the recurrence fails with the symmetry breaking.
Applying our theorem…
▽ More
Quantum (Poincaré) recurrence theorem are known for closed quantum (classical) systems. Can recurrence happen in open systems? We provide the recurrence theorem for open quantum systems via non-Hermitian (NH) description. We find that PT symmetry and pseudo-Hermitian symmetry protect recurrence for NH open quantum systems and the recurrence fails with the symmetry breaking.
Applying our theorem to PT-symmetric systems, we reveal why quantum recurrence happens in PT-unbroken phase but fails in PT-broken phase, which was misunderstood before.
A contradiction emerges when we apply our theorem to anti-PT symmetric systems and we settle it, revealing that distinguishability and von Neumann entropy are generally not effective to describe the information dynamics in NH systems.
A new approach is developed to investigate the information dynamics of NH systems. For anti-PT symmetric systems in PT-broken phase, we find there are three information-dynamics patterns: oscillations with an overall decrease (increase) , and periodic oscillations. The periodic oscillations (information complete retrieval) happen only if the spectrum of NH Hamiltonian is real. The three patterns degenerate to the periodic oscillation using distinguishability or von Neumann entropy because normalization of non-unitary evolved states leads to loss of information. We conclude with a discussion of the physical meaning behind the recurrence in open systems and give the direction of recurrence theorem not limited to conservative systems in classical mechanics.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Ultraviolet and Chromospheric activity and Habitability of M stars
Authors:
Xue Li,
Song Wang,
Henggeng Han,
Huiqin Yang,
Chuanjie Zheng,
Yang Huang,
Jifeng Liu
Abstract:
M-type stars are crucial for stellar activity studies since they cover two types of magnetic dynamos and particularly intriguing for habitability studies due to their abundance and long lifespans during the main-sequence stage. In this paper, we used the LAMOST DR9 catalog and the GALEX UV archive data to investigate the chromospheric and UV activities of M-type stars. All the chromospheric and UV…
▽ More
M-type stars are crucial for stellar activity studies since they cover two types of magnetic dynamos and particularly intriguing for habitability studies due to their abundance and long lifespans during the main-sequence stage. In this paper, we used the LAMOST DR9 catalog and the GALEX UV archive data to investigate the chromospheric and UV activities of M-type stars. All the chromospheric and UV activity indices clearly show the saturated and unsaturated regimes and the well-known activity-rotation relation, consistent with previous studies. Both the FUV and NUV activity indices exhibit a single-peaked distribution, while the {\rm H$α$} and \rm {Ca \scriptsize{\uppercase\expandafter{\romannumeral2}} \normalsize H$\&$K} indices show a distinct double-peaked distribution. The gap between these peaks suggests a rapid transition from a saturated population to an unsaturated one. The smoothly varying distributions of different subtypes suggest a rotation-dependent dynamo for both early-type (partly convective) to late-type (fully convective) M stars. We identified a group of stars with high UV activity above the saturation regime (log$R^{\prime}_{\rm NUV} > -2.5$) but low chromospheric activity, and the underlying reason is unknown. By calculating the continuously habitable zone and the UV habitable zone for each star, we found about 70\% stars in the total sample and 40\% stars within 100 pc are located in the overlapping region of these two habitable zones, indicating a number of M stars are potentially habitable. Finally, we examined the possibility of UV activity studies of M stars using the China Space Station Telescope.
△ Less
Submitted 27 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
On the Expressive Power of a Variant of the Looped Transformer
Authors:
Yihang Gao,
Chuanyang Zheng,
Enze Xie,
Han Shi,
Tianyang Hu,
Yu Li,
Michael K. Ng,
Zhenguo Li,
Zhaoqiang Liu
Abstract:
Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by t…
▽ More
Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer block, dubbed Algorithm Transformer (abbreviated as AlgoFormer). Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can achieve significantly higher expressiveness in algorithm representation when using the same number of parameters. In particular, inspired by the structure of human-designed learning algorithms, our transformer block consists of a pre-transformer that is responsible for task pre-processing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some challenging tasks.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
On Prompt-Driven Safeguarding for Large Language Models
Authors:
Chujie Zheng,
Fan Yin,
Hao Zhou,
Fandong Meng,
Jie Zhou,
Kai-Wei Chang,
Minlie Huang,
Nanyun Peng
Abstract:
Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refu…
▽ More
Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation. We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction, in which models become more prone to refusing to provide assistance, even when the queries are harmless. On the other hand, LLMs are naturally capable of distinguishing harmful and harmless queries without safety prompts. Inspired by these findings, we propose a method for safety prompt optimization, namely DRO (Directed Representation Optimization). Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness. Experiments with eight LLMs on out-of-domain and jailbreak benchmarks demonstrate that DRO remarkably improves the safeguarding performance of human-crafted safety prompts, without compromising the models' general performance.
△ Less
Submitted 3 June, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Weaver: Foundation Models for Creative Writing
Authors:
Tiannan Wang,
Jiamin Chen,
Qingrui Jia,
Shuai Wang,
Ruoyu Fang,
Huilin Wang,
Zhaowei Gao,
Chunzhao Xie,
Chuou Xu,
Jihong Dai,
Yibin Liu,
Jialong Wu,
Shengwei Ding,
Long Li,
Zhiwei Huang,
Xinle Deng,
Teng Yu,
Gangan Ma,
Han Xiao,
Zixin Chen,
Danjun Xiang,
Yunxia Wang,
Yuanyuan Zhu,
Yi Xiao,
Jing Wang
, et al. (21 additional authors not shown)
Abstract:
This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for…
▽ More
This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students
Authors:
Chengbo Zheng,
Kangyu Yuan,
Bingcan Guo,
Reza Hadi Mogavi,
Zhenhui Peng,
Shuai Ma,
Xiaojuan Ma
Abstract:
The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative worl…
▽ More
The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative world where they could freely employ AI in PBL while needing to report this process to assess their skills and contributions. Our workshops yielded various scenarios of students' use of AI in PBL and ways of analyzing these uses grounded by students' vision of education goal transformation. We also found students with different attitudes toward AI exhibited distinct preferences in how to analyze and understand the use of AI. Based on these findings, we discuss future research opportunities on student-AI interactions and understanding AI-enhanced learning.
△ Less
Submitted 29 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors:
Sheng Hong,
Junjie He,
Xinhu Zheng,
Chunran Zheng,
Shaojie Shen
Abstract:
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteri…
▽ More
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initialization for the scene's surface Gaussians and the sensor's poses of each frame are obtained using a LiDAR-inertial system with the feature of size-adaptive voxels. Then, we optimized and refined the Gaussians using visual-derived photometric gradients to optimize their quality and density.
Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. Bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality, while also holding potential applicability in real-time SLAM and robotics domains.
We release our software and hardware and self-collected datasets to benefit the community.
△ Less
Submitted 16 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction
Authors:
Jiangmeng Li,
Fei Song,
Yifan Jin,
Wenwen Qiang,
Changwen Zheng,
Fuchun Sun,
Hui Xiong
Abstract:
As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the…
▽ More
As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the perspective of distribution analyses, we disclose that the intrinsic issues behind the phenomenon are the over-multitudinous conceptual knowledge contained in PLMs and the abridged knowledge for target downstream domains, which jointly result in that PLMs mis-locate the knowledge distributions corresponding to the target domains in the universal knowledge embedding space. To this end, we intuitively explore to approximate the unabridged target domains of downstream tasks in a debiased manner, and then abstract such domains to generate discriminative prompts, thereby providing the de-ambiguous guidance for PLMs. Guided by such an intuition, we propose a simple yet effective approach, namely BayesPrompt, to learn prompts that contain the domain discriminative information against the interference from domain-irrelevant knowledge. BayesPrompt primitively leverages known distributions to approximate the debiased factual distributions of target domains and further uniformly samples certain representative features from the approximated distributions to generate the ultimate prompts for PLMs. We provide theoretical insights with the connection to domain adaptation. Empirically, our method achieves state-of-the-art performance on benchmarks.
△ Less
Submitted 20 March, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Extended imaginary gauge transformation in a general nonreciprocal lattice
Authors:
Yunyao Qi,
Jinghui Pi,
Yuquan Wu,
Heng Lin,
Chao Zheng,
Guilu Long
Abstract:
Imaginary gauge transformation (IGT) provides a clear understanding of the non-Hermitian skin effect by transforming the non-Hermitian Hamiltonians with real spectra into Hermitian ones. In this work, we extend this approach to the complex spectrum regime in a general nonreciprocal lattice model. We unveil the validity of IGT hinges on a class of pseudo-Hermitian symmetry. The generalized Brilloui…
▽ More
Imaginary gauge transformation (IGT) provides a clear understanding of the non-Hermitian skin effect by transforming the non-Hermitian Hamiltonians with real spectra into Hermitian ones. In this work, we extend this approach to the complex spectrum regime in a general nonreciprocal lattice model. We unveil the validity of IGT hinges on a class of pseudo-Hermitian symmetry. The generalized Brillouin zone of Hamiltonian respect such pseudo-Hermiticity is demonstrated to be a circle, which enables easy access to the continuum bands, localization length of skin modes, and relevant topological numbers. Furthermore, we investigate the applicability of IGT and the underlying pseudo-Hermiticity beyond nearest-neighbour hopping, offering a graphical interpretation. Our theoretical framework is applied to establish bulk-boundary correspondence in the nonreciprocal trimer Su-Schrieffer-Heeger model and analyze the localization behaviors of skin modes in the two-dimensional Hatano-Nelson model.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
T2MAC: Targeted and Trusted Multi-Agent Communication through Selective Engagement and Evidence-Driven Integration
Authors:
Chuxiong Sun,
Zehua Zang,
Jiabao Li,
Jiangmeng Li,
Xiao Xu,
Rui Wang,
Changwen Zheng
Abstract:
Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to int…
▽ More
Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to integrate observed and received information, impairing the learning process. To tackle these difficulties, we propose Targeted and Trusted Multi-Agent Communication (T2MAC), a straightforward yet effective method that enables agents to learn selective engagement and evidence-driven integration. With T2MAC, agents have the capability to craft individualized messages, pinpoint ideal communication windows, and engage with reliable partners, thereby refining communication efficiency. Following the reception of messages, the agents integrate information observed and received from different sources at an evidence level. This process enables agents to collectively use evidence garnered from multiple perspectives, fostering trusted and cooperative behaviors. We evaluate our method on a diverse set of cooperative multi-agent tasks, with varying difficulties, involving different scales and ranging from Hallway, MPE to SMAC. The experiments indicate that the proposed model not only surpasses the state-of-the-art methods in terms of cooperative performance and communication efficiency, but also exhibits impressive generalization.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Algebraic structure of the Gaussian-PDMF space and applications on fuzzy equations
Authors:
Chuang Zheng
Abstract:
In this paper, we extend the research presented in [Wang and Zheng, Fuzzy Sets and Systems, p108581, 2023] by establishing the algebraic structure of the Gaussian Probability Density Membership Function (Gaussian-PDMF) space. We consider fixed objective and subjective entities, denoted as $(h,p)$, and provide the explicit form of the membership function. Consequently, every fuzzy number with the m…
▽ More
In this paper, we extend the research presented in [Wang and Zheng, Fuzzy Sets and Systems, p108581, 2023] by establishing the algebraic structure of the Gaussian Probability Density Membership Function (Gaussian-PDMF) space. We consider fixed objective and subjective entities, denoted as $(h,p)$, and provide the explicit form of the membership function. Consequently, every fuzzy number with the membership function in $X_{h,p}(\mathbb{R})$, denoted as $\tilde{x}$, can be uniquely identified by a vector $\langle x; d^-, d^+, μ^-,μ^+\rangle$. Here, $x\in \mathbb{R}$ represents the "leading factor" of the fuzzy number $\tilde{x}$ with a membership degree equal to $1$. The parameters $d^-$ (left side) and $d^+$ (right side) denote the lengths of the compact support, while $μ^-$ (left side) and $μ^+$ (right side) represent the shapes. We introduce five operators: addition, subtraction, multiplication, scalar multiplication, and division. We demonstrate that, based on our definitions, the Gaussian-PDMF space exhibits a well-defined algebraic structure. For instance, $X_{h,p}(\mathbb{R})$ is a vector space over $\mathbb{R}$, featuring a subspace that forms a division ring, allowing for the representation of fuzzy polynomials, among other properties. We provide several examples to illustrate our theoretical results.
△ Less
Submitted 5 December, 2023;
originally announced January 2024.
-
Detector performance of the Gamma-ray Transient Monitor onboard DRO-A Satellite
Authors:
Pei-Yi Feng,
Zheng-Hua An,
Da-Li Zhang,
Chen-Wei Wang,
Chao Zheng,
Sheng Yang,
Shao-Lin Xiong,
Jia-Cong Liu,
Xin-Qiao Li,
Ke Gong,
Xiao-Jing Liu,
Min Gao,
Xiang-Yang Wen,
Ya-Qing liu,
Xiao-Yun Zhao,
Fan Zhang,
Xi-Lei Sun,
Hong Lu
Abstract:
Gamma-ray Transient Monitor (GTM) is an all-sky monitor onboard the Distant Retrograde Orbit-A (DRO-A) satellite with the scientific objective of detecting gamma-ray transients ranging from 20 keV to 1 MeV. GTM is equipped with 5 Gamma-ray Transient Probe (GTP) detector modules, utilizing the NaI(Tl) scintillator coupled with a SiPM array. To reduce the SiPM noise, GTP makes use of a dedicated dua…
▽ More
Gamma-ray Transient Monitor (GTM) is an all-sky monitor onboard the Distant Retrograde Orbit-A (DRO-A) satellite with the scientific objective of detecting gamma-ray transients ranging from 20 keV to 1 MeV. GTM is equipped with 5 Gamma-ray Transient Probe (GTP) detector modules, utilizing the NaI(Tl) scintillator coupled with a SiPM array. To reduce the SiPM noise, GTP makes use of a dedicated dual-channel coincident readout design. In this work, we firstly studied the impact of different coincidence times on detection efficiency and ultimately selected the 500 ns time coincidence window for offline data processing. To test the performance of GTPs and validate the Monte Carlo simulated energy response, we conducted comprehensive ground calibration tests using Hard X-ray Calibration Facility (HXCF) and radioactive sources, including energy response, detection efficiency, spatial response, bias-voltage response, and temperature dependence. We extensively presented the ground calibration results, and validated the design and mass model of GTP detector. These work paved the road for the in-flight observation and science data analysis.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Discrete conformal structures on surfaces with boundary (I) -- Classification
Authors:
Xu Xu,
Chao Zheng
Abstract:
In this paper, we introduce the discrete conformal structures on surfaces with boundary in an axiomatic approach pioneered by Glickenstein \cite{Glickenstein}. This ensures that the Poincaré dual of an ideally triangulated surface with boundary has a good geometric structure. Then we classify the discrete conformal structures on surfaces with boundary, which turns out to unify and generalize Guo-L…
▽ More
In this paper, we introduce the discrete conformal structures on surfaces with boundary in an axiomatic approach pioneered by Glickenstein \cite{Glickenstein}. This ensures that the Poincaré dual of an ideally triangulated surface with boundary has a good geometric structure. Then we classify the discrete conformal structures on surfaces with boundary, which turns out to unify and generalize Guo-Luo's generalized circle packings \cite{GL2}, Guo's vertex scalings \cite{Guo} and Xu's discrete conformal structures \cite{Xu22} on surfaces with boundary. The relationships between the discrete conformal structures on surfaces with boundary and the 3-dimensional hyperbolic geometry are also discussed.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
A discrete uniformization theorem for decorated piecewise hyperbolic metrics on surfaces
Authors:
Xu Xu,
Chao Zheng
Abstract:
In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces. A discrete uniformization theorem is established for this discrete Gaussian curvature. We further investigate the prescribing combinatorial curvature problem for a parametrization of this discrete Gaussian curvature, which is called the combinatorial $α$-curvature. To find decorated piecewise hyperbolic…
▽ More
In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces. A discrete uniformization theorem is established for this discrete Gaussian curvature. We further investigate the prescribing combinatorial curvature problem for a parametrization of this discrete Gaussian curvature, which is called the combinatorial $α$-curvature. To find decorated piecewise hyperbolic metrics with prescribed combinatorial $α$-curvatures, we introduce the combinatorial $α$-Ricci flow for decorated piecewise hyperbolic metrics. To handle the potential singularities along the combinatorial $α$-Ricci flow, we do surgery along the flow by edge flipping under the weighted Delaunay condition. Then we prove the longtime existence and convergence of the combinatorial $α$-Ricci flow with surgery. As an application of the combinatorial $α$-Ricci flow with surgery, we give the existence of decorated piecewise hyperbolic metrics with prescribed combinatorial $α$-curvatures. We further introduce the combinatorial $α$-Calabi flow with surgery and study its longtime behavior.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.