-
Fast Iterative Graph Computing with Updated Neighbor States
Authors:
Yijie Zhou,
Shufeng Gong,
Feng Yao,
Hanzhang Chen,
Song Yu,
Pengxi Liu,
Yanfeng Zhang,
Ge Yu,
Jeffrey Xu Yu
Abstract:
Enhancing the efficiency of iterative computation on graphs has garnered considerable attention in both industry and academia. Nonetheless, the majority of efforts focus on expediting iterative computation by minimizing the running time per iteration step, ignoring the optimization of the number of iteration rounds, which is a crucial aspect of iterative computation. We experimentally verified the…
▽ More
Enhancing the efficiency of iterative computation on graphs has garnered considerable attention in both industry and academia. Nonetheless, the majority of efforts focus on expediting iterative computation by minimizing the running time per iteration step, ignoring the optimization of the number of iteration rounds, which is a crucial aspect of iterative computation. We experimentally verified the correlation between the vertex processing order and the number of iterative rounds, thus making it possible to reduce the number of execution rounds for iterative computation. In this paper, we propose a graph reordering method, GoGraph, which can construct a well-formed vertex processing order effectively reducing the number of iteration rounds and, consequently, accelerating iterative computation. Before delving into GoGraph, a metric function is introduced to quantify the efficiency of vertex processing order in accelerating iterative computation. This metric reflects the quality of the processing order by counting the number of edges whose source precedes the destination. GoGraph employs a divide-and-conquer mindset to establish the vertex processing order by maximizing the value of the metric function. Our experimental results show that GoGraph outperforms current state-of-the-art reordering algorithms by 1.83x on average (up to 3.34x) in runtime.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction
Authors:
Fulong Yao,
Wanqing Zhao,
Matthew Forshaw,
Yang Song
Abstract:
This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. Differing from the traditional six-layer IT2FNN, a nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability. First, a new co-antecedent layer and a modified consequent layer are devised to im…
▽ More
This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. Differing from the traditional six-layer IT2FNN, a nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability. First, a new co-antecedent layer and a modified consequent layer are devised to improve the interpretability of the fuzzy model for multi-step predictions. Second, a new transformation layer is designed to address the potential issues in the vanished rule firing strength caused by highdimensional inputs. Third, a new link layer is proposed to build temporal connections between multi-step predictions. Furthermore, a two-stage self-organizing mechanism is developed to automatically generate the fuzzy rules, in which the first stage is used to create the rule base from empty and perform the initial optimization, while the second stage is to fine-tune all network parameters. Finally, various simulations are carried out on chaotic and microgrid time series prediction problems, demonstrating the superiority of our approach in terms of prediction accuracy, uncertainty handling and model interpretability.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Data Contamination Can Cross Language Barriers
Authors:
Feng Yao,
Yufan Zhuang,
Zihao Sun,
Sunan Xu,
Animesh Kumar,
Jingbo Shang
Abstract:
The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingua…
▽ More
The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingual form of contamination that inflates LLMs' performance while evading current detection methods, deliberately injected by overfitting LLMs on the translated versions of benchmark test sets. Then, we propose generalization-based approaches to unmask such deeply concealed contamination. Specifically, we examine the LLM's performance change after modifying the original benchmark by replacing the false answer choices with correct ones from other questions. Contaminated models can hardly generalize to such easier situations, where the false choices can be \emph{not even wrong}, as all choices are correct in their memorization. Experimental results demonstrate that cross-lingual contamination can easily fool existing detection methods, but not ours. In addition, we discuss the potential utilization of cross-lingual contamination in interpreting LLMs' working mechanisms and in post-training LLMs for enhanced multilingual capabilities. The code and dataset we use can be obtained from \url{https://github.com/ShangDataLab/Deep-Contam}.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
Authors:
Xian Sun,
Qiwei Yan,
Chubo Deng,
Chenglong Liu,
Yi Jiang,
Zhongyan Hou,
Wanxuan Lu,
Fanglong Yao,
Xiaoyu Liu,
Lingxiang Hao,
Hongfeng Yu
Abstract:
Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images ne…
▽ More
Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images necessitate higher time and manual interpretation costs for annotation compared to natural images. The lack of a large-scale public SGG benchmark is a major impediment to the advancement of SGG-related research in aerial imagery. In this paper, we introduce the first publicly available large-scale, million-level relation dataset in the field of remote sensing images which is named as ReCon1M. Specifically, our dataset is built upon Fair1M and comprises 21,392 images. It includes annotations for 859,751 object bounding boxes across 60 different categories, and 1,149,342 relation triplets across 64 categories based on these bounding boxes. We provide a detailed description of the dataset's characteristics and statistical information. We conducted two object detection tasks and three sub-tasks within SGG on this dataset, assessing the performance of mainstream methods on these tasks.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs
Authors:
Shang Zhou,
Feng Yao,
Chengyu Dong,
Zihan Wang,
Jingbo Shang
Abstract:
Controlling the attribute intensity of text generation is crucial across scenarios (e.g., writing conciseness, chatting emotion, and explanation clarity). The remarkable capabilities of large language models (LLMs) have revolutionized text generation, prompting us to explore such \emph{smooth control} of LLM generation. Specifically, we propose metrics to assess the range, calibration, and consist…
▽ More
Controlling the attribute intensity of text generation is crucial across scenarios (e.g., writing conciseness, chatting emotion, and explanation clarity). The remarkable capabilities of large language models (LLMs) have revolutionized text generation, prompting us to explore such \emph{smooth control} of LLM generation. Specifically, we propose metrics to assess the range, calibration, and consistency of the generated text's attribute intensity in response to varying control values, as well as its relevance to the intended context. To quantify the attribute intensity and context relevance, we propose an effective evaluation framework leveraging the Elo rating system and GPT4, both renowned for their robust alignment with human judgment. We look into two viable training-free methods for achieving smooth control of LLMs: (1) Prompting with semantic shifters, and (2) Modifying internal model representations. The evaluations of these two methods are conducted on $5$ different attributes with various models. Our code and dataset can be obtained from \url{https://github.com/ShangDataLab/Smooth-Control}.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
Authors:
Yuwei Cheng,
Fan Yao,
Xuefeng Liu,
Haifeng Xu
Abstract:
This paper studies Learning from Imperfect Human Feedback (LIHF), motivated by humans' potential irrationality or imperfect perception of true preference. We revisit the classic dueling bandit problem as a model of learning from comparative human feedback, and enrich it by casting the imperfection in human feedback as agnostic corruption to user utilities. We start by identifying the fundamental l…
▽ More
This paper studies Learning from Imperfect Human Feedback (LIHF), motivated by humans' potential irrationality or imperfect perception of true preference. We revisit the classic dueling bandit problem as a model of learning from comparative human feedback, and enrich it by casting the imperfection in human feedback as agnostic corruption to user utilities. We start by identifying the fundamental limits of LIHF and prove a regret lower bound of $Ω(\max\{T^{1/2},C\})$, even when the total corruption $C$ is known and when the corruption decays gracefully over time (i.e., user feedback becomes increasingly more accurate). We then turn to design robust algorithms applicable in real-world scenarios with arbitrary corruption and unknown $C$. Our key finding is that gradient-based algorithms enjoy a smooth efficiency-robustness tradeoff under corruption by varying their learning rates. Specifically, under general concave user utility, Dueling Bandit Gradient Descent (DBGD) of Yue and Joachims (2009) can be tuned to achieve regret $O(T^{1-α} + T^{ α} C)$ for any given parameter $α\in (0, \frac{1}{4}]$. Additionally, this result enables us to pin down the regret lower bound of the standard DBGD (the $α=1/4$ case) as $Ω(T^{3/4})$ for the first time, to the best of our knowledge. For strongly concave user utility we show a better tradeoff: there is an algorithm that achieves $O(T^α + T^{\frac{1}{2}(1-α)}C)$ for any given $α\in [\frac{1}{2},1)$. Our theoretical insights are corroborated by extensive experiments on real-world recommendation data.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Energy Efficiency Optimization of Multi-unit System with Different Devices
Authors:
Fulai Yao
Abstract:
The energy efficiency optimization of the power generation system and the energy efficiency optimization of the energy consumption system are unified into the same optimization problem, and a simple method to achieve energy efficiency optimization without establishing an accurate mathematical model of the system is proposed. For systems with similar energy efficiency, it is proved that the best lo…
▽ More
The energy efficiency optimization of the power generation system and the energy efficiency optimization of the energy consumption system are unified into the same optimization problem, and a simple method to achieve energy efficiency optimization without establishing an accurate mathematical model of the system is proposed. For systems with similar energy efficiency, it is proved that the best load distribution method between equipment is to keep the operating energy efficiency of each operating device equal, Yao's theorem 1. It is proved that the optimal switching method for the number of operating units between equipment with different energy efficiency is to keep the energy efficiency of the switching point equal, or at the maximum load point of the equipment, Yao's Theorem 2. This article gives two cases, a system composed of equipment with similar efficiency and a system composed of equipment with different efficiency.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
User Welfare Optimization in Recommender Systems with Competing Content Creators
Authors:
Fan Yao,
Yiming Liao,
Mingzhe Wu,
Chuanhao Li,
Yan Zhu,
James Yang,
Qifan Wang,
Haifeng Xu,
Hongning Wang
Abstract:
Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use…
▽ More
Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global user preference distribution often traps the competition, especially the creators, in states that yield sub-optimal user welfare. To encourage creators to best serve a broad user population with relevant content, it becomes the platform's responsibility to leverage its information advantage regarding user preference distribution to accurately signal creators.
In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies. To validate the effectiveness of our proposed method, we report our findings from a series of experiments, including: 1. a proof-of-concept negative example illustrating how creators' strategies converge towards sub-optimal states without platform intervention; 2. offline experiments employing our proposed intervention mechanisms on diverse datasets; and 3. results from a three-week online experiment conducted on a leading short-video recommendation platform.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph
Authors:
Xiaochen Kev Gao,
Feng Yao,
Kewen Zhao,
Beilei He,
Animesh Kumar,
Vish Krishnan,
Jingbo Shang
Abstract:
Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within…
▽ More
Model scaling is becoming the default choice for many language tasks due to the success of large language models (LLMs). However, it can fall short in specific scenarios where simple customized methods excel. In this paper, we delve into the patent approval pre-diction task and unveil that simple domain-specific graph methods outperform enlarging the model, using the intrinsic dependencies within the patent data. Specifically, we first extend the embedding-based state-of-the-art (SOTA) by scaling up its backbone model with various sizes of open-source LLMs, then explore prompt-based methods to harness proprietary LLMs' potential, but find the best results close to random guessing, underlining the ineffectiveness of model scaling-up. Hence, we propose a novel Fine-grained cLAim depeNdency (FLAN) Graph through meticulous patent data analyses, capturing the inherent dependencies across segments of the patent text. As it is model-agnostic, we apply cost-effective graph models to our FLAN Graph to obtain representations for approval prediction. Extensive experiments and detailed analyses prove that incorporating FLAN Graph via various graph models consistently outperforms all LLM baselines significantly. We hope that our observations and analyses in this paper can bring more attention to this challenging task and prompt further research into the limitations of LLMs. Our source code and dataset can be obtained from http://github.com/ShangDataLab/FLAN-Graph.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Search for a sub-eV sterile neutrino using Daya Bay's full dataset
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding,
Y. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis…
▽ More
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$.
△ Less
Submitted 15 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Total Gluon Helicity from Lattice without Effective Theory Matching
Authors:
Zhuoyi Pang,
Fei Yao,
Jian-Hui Zhang
Abstract:
We propose two approaches for extracting the total gluon helicity contribution to proton spin from lattice QCD, one from local operator matrix elements in a fixed gauge accessible on lattice with feasible renormalization, and the other from gauge-invariant nonlocal gluon correlators. Neither of these approaches requires a matching procedure when converted to the MS scheme. Our proposal resolves a…
▽ More
We propose two approaches for extracting the total gluon helicity contribution to proton spin from lattice QCD, one from local operator matrix elements in a fixed gauge accessible on lattice with feasible renormalization, and the other from gauge-invariant nonlocal gluon correlators. Neither of these approaches requires a matching procedure when converted to the MS scheme. Our proposal resolves a long-standing inconsistency in the literature regarding lattice calculations of the total gluon helicity, and has the potential to greatly facilitate these calculations.
△ Less
Submitted 29 June, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
MetaIE: Distilling a Meta Model from LLM for All Kinds of Information Extraction Tasks
Authors:
Letian Peng,
Zilong Wang,
Feng Yao,
Zihan Wang,
Jingbo Shang
Abstract:
Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. I…
▽ More
Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. In this paper, we propose a novel framework MetaIE to build a small LM as meta-model by learning to extract "important information", i.e., the meta-understanding of IE, so that this meta-model can be adapted to all kind of IE tasks effectively and efficiently. Specifically, MetaIE obtains the small LM via a symbolic distillation from an LLM following the label-to-span scheme. We construct the distillation dataset via sampling sentences from language model pre-training datasets (e.g., OpenWebText in our implementation) and prompting an LLM to identify the typed spans of "important information". We evaluate the meta-model under the few-shot adaptation setting. Extensive results on 13 datasets from 6 IE tasks confirm that MetaIE can offer a better starting point for few-shot tuning on IE datasets and outperform other meta-models from (1) vanilla language model pre-training, (2) multi-IE-task pre-training with human annotations, and (3) single-IE-task symbolic distillation from LLM. Moreover, we provide comprehensive analyses of MetaIE, such as the size of the distillation dataset, the meta-model architecture, and the size of the meta-model.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Green's matching: an efficient approach to parameter estimation in complex dynamic systems
Authors:
Jianbin Tan,
Guoyu Zhang,
Xueqin Wang,
Hui Huang,
Fang Yao
Abstract:
Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist…
▽ More
Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statistically efficient two-step method, which only needs to approximate trajectories in dynamic systems but not their derivatives due to the inverse of differential operators by Green's function. This yields a statistically optimal guarantee for parameter estimation in general-order equations, a feature not shared by existing methods, and provides an efficient framework for broad statistical inferences in complex dynamic systems.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Spatially Randomized Designs Can Enhance Policy Evaluation
Authors:
Ying Yang,
Chengchun Shi,
Fang Yao,
Shouyang Wang,
Hongtu Zhu
Abstract:
This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlapping units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply…
▽ More
This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlapping units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply the same treatment across all units simultaneously. We examine both parametric and nonparametric methods for estimating and inferring policy values based on this randomized approach. Our analysis includes evaluating the mean squared error of the treatment effect estimator and the statistical power of the associated tests. Additionally, we extend our findings to experiments with spatio-temporal dependencies, where treatments are allocated sequentially over time, and account for potential temporal carryover effects. Our theoretical insights are supported by comprehensive numerical experiments.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Cooperative Classification and Rationalization for Graph Generalization
Authors:
Linan Yue,
Qi Liu,
Ye Liu,
Weibo Gao,
Fangzhou Yao,
Wenfeng Li
Abstract:
Graph Neural Networks (GNNs) have achieved impressive results in graph classification tasks, but they struggle to generalize effectively when faced with out-of-distribution (OOD) data. Several approaches have been proposed to address this problem. Among them, one solution is to diversify training distributions in vanilla classification by modifying the data environment, yet accessing the environme…
▽ More
Graph Neural Networks (GNNs) have achieved impressive results in graph classification tasks, but they struggle to generalize effectively when faced with out-of-distribution (OOD) data. Several approaches have been proposed to address this problem. Among them, one solution is to diversify training distributions in vanilla classification by modifying the data environment, yet accessing the environment information is complex. Besides, another promising approach involves rationalization, extracting invariant rationales for predictions. However, extracting rationales is difficult due to limited learning signals, resulting in less accurate rationales and diminished predictions. To address these challenges, in this paper, we propose a Cooperative Classification and Rationalization (C2R) method, consisting of the classification and the rationalization module. Specifically, we first assume that multiple environments are available in the classification module. Then, we introduce diverse training distributions using an environment-conditional generative network, enabling robust graph representations. Meanwhile, the rationalization module employs a separator to identify relevant rationale subgraphs while the remaining non-rationale subgraphs are de-correlated with labels. Next, we align graph representations from the classification module with rationale subgraph representations using the knowledge distillation methods, enhancing the learning signal for rationales. Finally, we infer multiple environments by gathering non-rationale representations and incorporate them into the classification module for cooperative learning. Extensive experimental results on both benchmarks and synthetic datasets demonstrate the effectiveness of C2R. Code is available at https://github.com/yuelinan/Codes-of-C2R.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning
Authors:
Fulong Yao,
Wanqing Zhao,
Matthew Forshaw,
Yang Song
Abstract:
The global energy landscape is undergoing a transformation towards decarbonization, sustainability, and cost-efficiency. In this transition, microgrid systems integrated with renewable energy sources (RES) and energy storage systems (ESS) have emerged as a crucial component. However, optimizing the operational control of such an integrated energy system lacks a holistic view of multiple environmen…
▽ More
The global energy landscape is undergoing a transformation towards decarbonization, sustainability, and cost-efficiency. In this transition, microgrid systems integrated with renewable energy sources (RES) and energy storage systems (ESS) have emerged as a crucial component. However, optimizing the operational control of such an integrated energy system lacks a holistic view of multiple environmental, infrastructural and economic considerations, not to mention the need to factor in the uncertainties from both the supply and demand. This paper presents a holistic datadriven power optimization approach based on deep reinforcement learning (DRL) for microgrid control considering the multiple needs of decarbonization, sustainability and cost-efficiency. First, two data-driven control schemes, namely the prediction-based (PB) and prediction-free (PF) schemes, are devised to formulate the control problem within a Markov decision process (MDP). Second, a multivariate objective (reward) function is designed to account for the market profits, carbon emissions, peak load, and battery degradation of the microgrid system. Third, we develop a Double Dueling Deep Q Network (D3QN) architecture to optimize the power flows for real-time energy management and determine charging/discharging strategies of ESS. Finally, extensive simulations are conducted to demonstrate the effectiveness and superiority of the proposed approach through a comparative analysis. The results and analysis also suggest the respective circumstances for using the two control schemes in practical implementations with uncertainties.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Authors:
Liangyu Xu,
Wanxuan Lu,
Hongfeng Yu,
Fanglong Yao,
Xian Sun,
Kun Fu
Abstract:
Extrapolating future weather radar echoes from past observations is a complex task vital for precipitation nowcasting. The spatial morphology and temporal evolution of radar echoes exhibit a certain degree of correlation, yet they also possess independent characteristics. {Existing methods learn unified spatial and temporal representations in a highly coupled feature space, emphasizing the correla…
▽ More
Extrapolating future weather radar echoes from past observations is a complex task vital for precipitation nowcasting. The spatial morphology and temporal evolution of radar echoes exhibit a certain degree of correlation, yet they also possess independent characteristics. {Existing methods learn unified spatial and temporal representations in a highly coupled feature space, emphasizing the correlation between spatial and temporal features but neglecting the explicit modeling of their independent characteristics, which may result in mutual interference between them.} To effectively model the spatiotemporal dynamics of radar echoes, we propose a Spatial-Frequency-Temporal correlation-decoupling Transformer (SFTformer). The model leverages stacked multiple SFT-Blocks to not only mine the correlation of the spatiotemporal dynamics of echo cells but also avoid the mutual interference between the temporal modeling and the spatial morphology refinement by decoupling them. Furthermore, inspired by the practice that weather forecast experts effectively review historical echo evolution to make accurate predictions, SFTfomer incorporates a joint training paradigm for historical echo sequence reconstruction and future echo sequence prediction. Experimental results on the HKO-7 dataset and ChinaNorth-2021 dataset demonstrate the superior performance of SFTfomer in short(1h), mid(2h), and long-term(3h) precipitation nowcasting.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?
Authors:
Fan Yao,
Chuanhao Li,
Denis Nekipelov,
Hongning Wang,
Haifeng Xu
Abstract:
The advent of generative AI (GenAI) technology produces transformative impact on the content creation landscape, offering alternative approaches to produce diverse, high-quality content across media, thereby reshaping online ecosystems but also raising concerns about market over-saturation and the potential marginalization of human creativity. Our work introduces a competition model generalized fr…
▽ More
The advent of generative AI (GenAI) technology produces transformative impact on the content creation landscape, offering alternative approaches to produce diverse, high-quality content across media, thereby reshaping online ecosystems but also raising concerns about market over-saturation and the potential marginalization of human creativity. Our work introduces a competition model generalized from the Tullock contest to analyze the tension between human creators and GenAI. Our theory and simulations suggest that despite challenges, a stable equilibrium between human and AI-generated content is possible. Our work contributes to understanding the competitive dynamics in the content creation industry, offering insights into the future interplay between human creativity and technological advancements in GenAI.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay
Authors:
Daya Bay Collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546…
▽ More
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective
Authors:
Feiyu Yao,
Zongkai Wu,
Li Yi
Abstract:
Estimating 3D full-body pose from sparse sensor data is a pivotal technique employed for the reconstruction of realistic human motions in Augmented Reality and Virtual Reality. However, translating sparse sensor signals into comprehensive human motion remains a challenge since the sparsely distributed sensors in common VR systems fail to capture the motion of full human body. In this paper, we use…
▽ More
Estimating 3D full-body pose from sparse sensor data is a pivotal technique employed for the reconstruction of realistic human motions in Augmented Reality and Virtual Reality. However, translating sparse sensor signals into comprehensive human motion remains a challenge since the sparsely distributed sensors in common VR systems fail to capture the motion of full human body. In this paper, we use well-designed Body Pose Graph (BPG) to represent the human body and translate the challenge into a prediction problem of graph missing nodes. Then, we propose a novel full-body motion reconstruction framework based on BPG. To establish BPG, nodes are initially endowed with features extracted from sparse sensor signals. Features from identifiable joint nodes across diverse sensors are amalgamated and processed from both temporal and spatial perspectives. Temporal dynamics are captured using the Temporal Pyramid Structure, while spatial relations in joint movements inform the spatial attributes. The resultant features serve as the foundational elements of the BPG nodes. To further refine the BPG, node features are updated through a graph neural network that incorporates edge reflecting varying joint relations. Our method's effectiveness is evidenced by the attained state-of-the-art performance, particularly in lower body motion, outperforming other baseline methods. Additionally, an ablation study validates the efficacy of each module in our proposed framework.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Charged-current non-standard neutrino interactions at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-…
▽ More
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases.
△ Less
Submitted 19 March, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Zero-1-to-3: Domain-level Zero-shot Cognitive Diagnosis via One Batch of Early-bird Students towards Three Diagnostic Objectives
Authors:
Weibo Gao,
Qi Liu,
Hao Wang,
Linan Yue,
Haoyang Bi,
Yin Gu,
Fangzhou Yao,
Zheng Zhang,
Xin Li,
Yuanjing He
Abstract:
Cognitive diagnosis seeks to estimate the cognitive states of students by exploring their logged practice quiz data. It plays a pivotal role in personalized learning guidance within intelligent education systems. In this paper, we focus on an important, practical, yet often underexplored task: domain-level zero-shot cognitive diagnosis (DZCD), which arises due to the absence of student practice lo…
▽ More
Cognitive diagnosis seeks to estimate the cognitive states of students by exploring their logged practice quiz data. It plays a pivotal role in personalized learning guidance within intelligent education systems. In this paper, we focus on an important, practical, yet often underexplored task: domain-level zero-shot cognitive diagnosis (DZCD), which arises due to the absence of student practice logs in newly launched domains. Recent cross-domain diagnostic models have been demonstrated to be a promising strategy for DZCD. These methods primarily focus on how to transfer student states across domains. However, they might inadvertently incorporate non-transferable information into student representations, thereby limiting the efficacy of knowledge transfer. To tackle this, we propose Zero-1-to-3, a domain-level zero-shot cognitive diagnosis framework via one batch of early-bird students towards three diagnostic objectives. Our approach initiates with pre-training a diagnosis model with dual regularizers, which decouples student states into domain-shared and domain-specific parts. The shared cognitive signals can be transferred to the target domain, enriching the cognitive priors for the new domain, which ensures the cognitive state propagation objective. Subsequently, we devise a strategy to generate simulated practice logs for cold-start students through analyzing the behavioral patterns from early-bird students, fulfilling the domain-adaption goal. Consequently, we refine the cognitive states of cold-start students as diagnostic outcomes via virtual data, aligning with the diagnosis-oriented goal. Finally, extensive experiments on six real-world datasets highlight the efficacy of our model for DZCD and its practical application in question recommendation. The code is publicly available at https://github.com/bigdata-ustc/Zero-1-to-3.
△ Less
Submitted 4 February, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Numerical approximation of discontinuous solutions of the semilinear wave equation
Authors:
Jiachuan Cao,
Buyang Li,
Yanping Lin,
Fangyan Yao
Abstract:
A fully discrete low-regularity integrator with high-frequency recovery techniques is constructed to approximate rough and possibly discontinuous solutions of the semilinear wave equation. The proposed method can capture the discontinuities of the solutions correctly without spurious oscillations and can approximate rough and discontinuous solutions with a higher convergence rate than pre-existing…
▽ More
A fully discrete low-regularity integrator with high-frequency recovery techniques is constructed to approximate rough and possibly discontinuous solutions of the semilinear wave equation. The proposed method can capture the discontinuities of the solutions correctly without spurious oscillations and can approximate rough and discontinuous solutions with a higher convergence rate than pre-existing methods. Rigorous analysis is presented for the convergence rates of the proposed method in approximating solutions such that $(u,\partial_{t}u)\in C([0,T];H^γ\times H^{γ-1})$ for $γ\in(0,1]$. For discontinuous solutions of bounded variation in one dimension (which allow jump discontinuities), the proposed method is proved to have almost first-order convergence under the step size condition $τ\sim N^{-1}$, where $τ$ and $N$ denote the time step size and the number of Fourier terms in the space discretization, respectively. Extensive numerical examples are presented in both one and two dimensions to illustrate the advantages of the proposed method in improving the accuracy in approximating rough and discontinuous solutions of the semilinear wave equation. The numerical results are consistent with the theoretical results and show the efficiency of the proposed method.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Incorporating granularity bias as the margin into contrastive loss for video captioning
Authors:
Jiayang Gu,
Fengming Yao
Abstract:
Video captioning models easily suffer from long-tail distribution of phrases, which makes captioning models prone to generate vague sentences instead of accurate ones. However, existing debiasing strategies tend to export external knowledge to build dependency trees of words or refine frequency distribution by complex losses and extra input features, which lack interpretability and are hard to tra…
▽ More
Video captioning models easily suffer from long-tail distribution of phrases, which makes captioning models prone to generate vague sentences instead of accurate ones. However, existing debiasing strategies tend to export external knowledge to build dependency trees of words or refine frequency distribution by complex losses and extra input features, which lack interpretability and are hard to train. To mitigate the impact of granularity bias on the model, we introduced a statistical-based bias extractor. This extractor quantifies the information content within sentences and videos, providing an estimate of the likelihood that a video-sentence pair is affected by granularity bias. Furthermore, with the growing trend of integrating contrastive learning methods into video captioning tasks, we use a bidirectional triplet loss to get more negative samples in a batch. Subsequently, we incorporate the margin score into the contrastive learning loss, establishing distinct training objectives for head and tail sentences. This approach facilitates the model's training effectiveness on tail samples. Our simple yet effective loss, incorporating Granularity bias, is referred to as the Margin-Contrastive Loss (GMC Loss). The proposed model demonstrates state-of-the-art performance on MSRVTT with a CIDEr of 57.17, and MSVD, where CIDEr reaches up to 138.68.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Preference Elicitation with Soft Attributes in Interactive Recommendation
Authors:
Erdem Biyik,
Fan Yao,
Yinlam Chow,
Alex Haig,
Chih-wei Hsu,
Mohammad Ghavamzadeh,
Craig Boutilier
Abstract:
Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth se…
▽ More
Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth semantics is given. Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation. Our techniques query users using both items and soft attributes to update the recommender system's belief about their preferences to improve recommendation quality. We demonstrate the effectiveness of our methods vis-a-vis competing approaches on both synthetic and real-world datasets.
△ Less
Submitted 22 October, 2023;
originally announced November 2023.
-
MUSER: A Multi-View Similar Case Retrieval Dataset
Authors:
Qingquan Li,
Yiran Hu,
Feng Yao,
Chaojun Xiao,
Zhiyuan Liu,
Maosong Sun,
Weixing Shen
Abstract:
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are t…
▽ More
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Learning prediction function of prior measures for statistical inverse problems of partial differential equations
Authors:
Junxiong Jia,
Deyu Meng,
Zongben Xu,
Fang Yao
Abstract:
In this paper, we view the statistical inverse problems of partial differential equations (PDEs) as PDE-constrained regression and focus on learning the prediction function of the prior probability measures. From this perspective, we propose general generalization bounds for learning infinite-dimensionally defined prior measures in the style of the probability approximately correct Bayesian learni…
▽ More
In this paper, we view the statistical inverse problems of partial differential equations (PDEs) as PDE-constrained regression and focus on learning the prediction function of the prior probability measures. From this perspective, we propose general generalization bounds for learning infinite-dimensionally defined prior measures in the style of the probability approximately correct Bayesian learning theory. The theoretical framework is rigorously defined on infinite-dimensional separable function space, which makes the theories intimately connected to the usual infinite-dimensional Bayesian inverse approach. Inspired by the concept of $α$-differential privacy, a generalized condition (containing the usual Gaussian measures employed widely in the statistical inverse problems of PDEs) has been proposed, which allows the learned prior measures to depend on the measured data (the prediction function with measured data as input and the prior measure as output can be introduced). After illustrating the general theories, the specific settings of linear and nonlinear problems have been given and can be easily casted into our general theories to obtain concrete generalization bounds. Based on the obtained generalization bounds, infinite-dimensionally well-defined practical algorithms are formulated. Finally, numerical examples of the backward diffusion and Darcy flow problems are provided to demonstrate the potential applications of the proposed approach in learning the prediction function of the prior probability measures.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding
Authors:
Hao Peng,
Xiaozhi Wang,
Feng Yao,
Zimu Wang,
Chuzhao Zhu,
Kaisheng Zeng,
Lei Hou,
Juanzi Li
Abstract:
Event understanding aims at understanding the content and relationship of events within texts, which covers multiple complicated information extraction tasks: event detection, event argument extraction, and event relation extraction. To facilitate related research and application, we present an event understanding toolkit OmniEvent, which features three desiderata: (1) Comprehensive. OmniEvent sup…
▽ More
Event understanding aims at understanding the content and relationship of events within texts, which covers multiple complicated information extraction tasks: event detection, event argument extraction, and event relation extraction. To facilitate related research and application, we present an event understanding toolkit OmniEvent, which features three desiderata: (1) Comprehensive. OmniEvent supports mainstream modeling paradigms of all the event understanding tasks and the processing of 15 widely-used English and Chinese datasets. (2) Fair. OmniEvent carefully handles the inconspicuous evaluation pitfalls reported in Peng et al. (2023), which ensures fair comparisons between different models. (3) Easy-to-use. OmniEvent is designed to be easily used by users with varying needs. We provide off-the-shelf models that can be directly deployed as web services. The modular framework also enables users to easily implement and evaluate new event understanding models with OmniEvent. The toolkit (https://github.com/THU-KEG/OmniEvent) is publicly released along with the demonstration website and video (https://omnievent.xlore.cn/).
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
FedJudge: Federated Legal Large Language Model
Authors:
Linan Yue,
Qi Liu,
Yichao Du,
Weibo Gao,
Ye Liu,
Fangzhou Yao
Abstract:
Large Language Models (LLMs) have gained prominence in the field of Legal Intelligence, offering potential applications in assisting legal professionals and laymen. However, the centralized training of these Legal LLMs raises data privacy concerns, as legal data is distributed among various institutions containing sensitive individual information. This paper addresses this challenge by exploring t…
▽ More
Large Language Models (LLMs) have gained prominence in the field of Legal Intelligence, offering potential applications in assisting legal professionals and laymen. However, the centralized training of these Legal LLMs raises data privacy concerns, as legal data is distributed among various institutions containing sensitive individual information. This paper addresses this challenge by exploring the integration of Legal LLMs with Federated Learning (FL) methodologies. By employing FL, Legal LLMs can be fine-tuned locally on devices or clients, and their parameters are aggregated and distributed on a central server, ensuring data privacy without directly sharing raw data. However, computation and communication overheads hinder the full fine-tuning of LLMs under the FL setting. Moreover, the distribution shift of legal data reduces the effectiveness of FL methods. To this end, in this paper, we propose the first Federated Legal Large Language Model (FedJudge) framework, which fine-tunes Legal LLMs efficiently and effectively. Specifically, FedJudge utilizes parameter-efficient fine-tuning methods to update only a few additional parameters during the FL training. Besides, we explore the continual learning methods to preserve the global model's important parameters when training local clients to mitigate the problem of data shifts. Extensive experimental results on three real-world datasets clearly validate the effectiveness of FedJudge. Code is released at https://github.com/yuelinan/FedJudge.
△ Less
Submitted 10 April, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Towards the Identifiability and Explainability for Personalized Learner Modeling: An Inductive Paradigm
Authors:
Jiatong Li,
Qi Liu,
Fei Wang,
Jiayu Liu,
Zhenya Huang,
Fangzhou Yao,
Linbo Zhu,
Yu Su
Abstract:
Personalized learner modeling using cognitive diagnosis (CD), which aims to model learners' cognitive states by diagnosing learner traits from behavioral data, is a fundamental yet significant task in many web learning services. Existing cognitive diagnosis models (CDMs) follow the proficiency-response paradigm that views learner traits and question parameters as trainable embeddings and learns th…
▽ More
Personalized learner modeling using cognitive diagnosis (CD), which aims to model learners' cognitive states by diagnosing learner traits from behavioral data, is a fundamental yet significant task in many web learning services. Existing cognitive diagnosis models (CDMs) follow the proficiency-response paradigm that views learner traits and question parameters as trainable embeddings and learns them through learner performance prediction. However, we notice that this paradigm leads to the inevitable non-identifiability and explainability overfitting problem, which is harmful to the quantification of learners' cognitive states and the quality of web learning services. To address these problems, we propose an identifiable cognitive diagnosis framework (ID-CDF) based on a novel response-proficiency-response paradigm inspired by encoder-decoder models. Specifically, we first devise the diagnostic module of ID-CDF, which leverages inductive learning to eliminate randomness in optimization to guarantee identifiability and captures the monotonicity between overall response data distribution and cognitive states to prevent explainability overfitting. Next, we propose a flexible predictive module for ID-CDF to ensure diagnosis preciseness. We further present an implementation of ID-CDF, i.e., ID-CDM, to illustrate its usability. Extensive experiments on four real-world datasets with different characteristics demonstrate that ID-CDF can effectively address the problems without loss of diagnosis preciseness.
△ Less
Submitted 19 February, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Multiple antiferromagnetic phases and magnetic anisotropy in exfoliated CrBr$_3$ multilayers
Authors:
Fengrui Yao,
Volodymyr Multian,
Zhe Wang,
Nicolas Ubrig,
Jérémie Teyssier,
Fan Wu,
Enrico Giannini,
Marco Gibertini,
Ignacio Gutiérrez-Lezama,
Alberto F. Morpurgo
Abstract:
In twisted two-dimensional (2D) magnets, the stacking dependence of the magnetic exchange interaction can lead to regions of ferromagnetic and antiferromagnetic interlayer order, separated by non-collinear, skyrmion-like spin textures. Recent experimental searches for these textures have focused on CrI$_3$, known to exhibit either ferromagnetic or antiferromagnetic interlayer order, depending on l…
▽ More
In twisted two-dimensional (2D) magnets, the stacking dependence of the magnetic exchange interaction can lead to regions of ferromagnetic and antiferromagnetic interlayer order, separated by non-collinear, skyrmion-like spin textures. Recent experimental searches for these textures have focused on CrI$_3$, known to exhibit either ferromagnetic or antiferromagnetic interlayer order, depending on layer stacking. However, the very strong uniaxial anisotropy of CrI$_3$ disfavors smooth non-collinear phases in twisted bilayers. Here, we report the experimental observation of three distinct magnetic phases -- one ferromagnetic and two antiferromagnetic -- in exfoliated CrBr$_3$ multilayers, and reveal that the uniaxial anisotropy is significantly smaller than in CrI$_3$. These results are obtained by magnetoconductance measurements on CrBr$_3$ tunnel barriers and Raman spectroscopy, in conjunction with density functional theory calculations, which enable us to identify the stackings responsible for the different interlayer magnetic couplings. The detection of all locally stable magnetic states predicted to exist in CrBr$_3$ and the excellent agreement found between theory and experiments, provide complete information on the stacking-dependent interlayer exchange energy and establish twisted bilayer CrBr$_3$ as an ideal system to deterministically create non-collinear magnetic phases.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals
Authors:
Fanglong Yao,
Changyuan Tian,
Jintao Liu,
Zequn Zhang,
Qing Liu,
Li Jin,
Shuchao Li,
Xiaoyu Li,
Xian Sun
Abstract:
Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks. Chain-of-Thought (CoT) technique is widely regarded as one of the effective methods for enhancing the reasoning ability of foundation models and has garnered significant attention. However, the reasoning process of CoT is linear, step-by-step, similar to pers…
▽ More
Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks. Chain-of-Thought (CoT) technique is widely regarded as one of the effective methods for enhancing the reasoning ability of foundation models and has garnered significant attention. However, the reasoning process of CoT is linear, step-by-step, similar to personal logical reasoning, suitable for solving general and slightly complicated problems. On the contrary, the thinking pattern of an expert owns two prominent characteristics that cannot be handled appropriately in CoT, i.e., high-order multi-hop reasoning and multimodal comparative judgement. Therefore, the core motivation of this paper is transcending CoT to construct a reasoning paradigm that can think like an expert. The hyperedge of a hypergraph could connect various vertices, making it naturally suitable for modelling high-order relationships. Inspired by this, this paper innovatively proposes a multimodal Hypergraph-of-Thought (HoT) reasoning paradigm, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning and multimodal comparative judgement. Specifically, a textual hypergraph-of-thought is constructed utilizing triple as the primary thought to model higher-order relationships, and a hyperedge-of-thought is generated through multi-hop walking paths to achieve multi-hop inference. Furthermore, we devise a visual hypergraph-of-thought to interact with the textual hypergraph-of-thought via Cross-modal Co-Attention Graph Learning for multimodal comparative verification. Experimentations on the ScienceQA benchmark demonstrate the proposed HoT-based T5 outperforms CoT-based GPT3.5 and chatGPT, which is on par with CoT-based GPT4 with a lower model size.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?
Authors:
Fan Yao,
Chuanhao Li,
Karthik Abinav Sankararaman,
Yiming Liao,
Yan Zhu,
Qifan Wang,
Hongning Wang,
Haifeng Xu
Abstract:
The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affect their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the plat…
▽ More
The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affect their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the platform's reward mechanism in order to steer the creators' competition towards a desirable welfare outcome in the long run. This work makes two major contributions in this regard: first, we uncover a fundamental limit about a class of widely adopted mechanisms, coined Merit-based Monotone Mechanisms, by showing that they inevitably lead to a constant fraction loss of the optimal welfare. To circumvent this limitation, we introduce Backward Rewarding Mechanisms (BRMs) and show that the competition game resultant from BRMs possesses a potential game structure. BRMs thus naturally induce strategic creators' collective behaviors towards optimizing the potential function, which can be designed to match any given welfare metric. In addition, the BRM class can be parameterized to allow the platform to directly optimize welfare within the feasible mechanism space even when the welfare metric is not explicitly defined.
△ Less
Submitted 9 July, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation
Authors:
Hao Peng,
Xiaozhi Wang,
Feng Yao,
Kaisheng Zeng,
Lei Hou,
Juanzi Li,
Zhiyuan Liu,
Weixing Shen
Abstract:
Event extraction (EE) is a crucial task aiming at extracting events from texts, which includes two subtasks: event detection (ED) and event argument extraction (EAE). In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing…
▽ More
Event extraction (EE) is a crucial task aiming at extracting events from texts, which includes two subtasks: event detection (ED) and event argument extraction (EAE). In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers. (2) The output space discrepancy of different model paradigms makes different-paradigm EE models lack grounds for comparison and also leads to unclear mapping issues between predictions and annotations. (3) The absence of pipeline evaluation of many EAE-only works makes them hard to be directly compared with EE works and may not well reflect the model performance in real-world pipeline scenarios. We demonstrate the significant influence of these pitfalls through comprehensive meta-analyses of recent papers and empirical experiments. To avoid these pitfalls, we suggest a series of remedies, including specifying data preprocessing, standardizing outputs, and providing pipeline evaluation results. To help implement these remedies, we develop a consistent evaluation framework OMNIEVENT, which can be obtained from https://github.com/THU-KEG/OmniEvent.
△ Less
Submitted 15 June, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Robust Functional Data Analysis for Discretely Observed Data
Authors:
Lingxuan Shao,
Fang Yao
Abstract:
This paper examines robust functional data analysis for discretely observed data, where the underlying process encompasses various distributions, such as heavy tail, skewness, or contaminations. We propose a unified robust concept of functional mean, covariance, and principal component analysis, while existing methods and definitions often differ from one another or only address fully observed fun…
▽ More
This paper examines robust functional data analysis for discretely observed data, where the underlying process encompasses various distributions, such as heavy tail, skewness, or contaminations. We propose a unified robust concept of functional mean, covariance, and principal component analysis, while existing methods and definitions often differ from one another or only address fully observed functions (the ``ideal'' case). Specifically, the robust functional mean can deviate from its non-robust counterpart and is estimated using robust local linear regression. Moreover, we define a new robust functional covariance that shares useful properties with the classic version. Importantly, this covariance yields the robust version of Karhunen--Loève decomposition and corresponding principal components beneficial for dimension reduction. The theoretical results of the robust functional mean, covariance, and eigenfunction estimates, based on pooling discretely observed data (ranging from sparse to dense), are established and aligned with their non-robust counterparts. The newly-proposed perturbation bounds for estimated eigenfunctions, with indexes allowed to grow with sample size, lay the foundation for further modeling based on robust functional principal component analysis.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Dynamic Matrix Recovery
Authors:
Ziyuan Chen,
Ying Yang,
Fang Yao
Abstract:
Matrix recovery from sparse observations is an extensively studied topic emerging in various applications, such as recommendation system and signal processing, which includes the matrix completion and compressed sensing models as special cases. In this work we propose a general framework for dynamic matrix recovery of low-rank matrices that evolve smoothly over time. We start from the setting that…
▽ More
Matrix recovery from sparse observations is an extensively studied topic emerging in various applications, such as recommendation system and signal processing, which includes the matrix completion and compressed sensing models as special cases. In this work we propose a general framework for dynamic matrix recovery of low-rank matrices that evolve smoothly over time. We start from the setting that the observations are independent across time, then extend to the setting that both the design matrix and noise possess certain temporal correlation via modified concentration inequalities. By pooling neighboring observations, we obtain sharp estimation error bounds of both settings, showing the influence of the underlying smoothness, the dependence and effective samples. We propose a dynamic fast iterative shrinkage thresholding algorithm that is computationally efficient, and characterize the interplay between algorithmic and statistical convergence. Simulated and real data examples are provided to support such findings.
△ Less
Submitted 21 November, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Fast Signal Region Detection with Application to Whole Genome Association Studies
Authors:
Wei Zhang,
Fan Wang,
Fang Yao
Abstract:
Research on the localization of the genetic basis associated with diseases or traits has been widely conducted in the last a few decades. Scan methods have been developed for region-based analysis in whole-genome association studies, helping us better understand how genetics influences human diseases or traits, especially when the aggregated effects of multiple causal variants are present. In this…
▽ More
Research on the localization of the genetic basis associated with diseases or traits has been widely conducted in the last a few decades. Scan methods have been developed for region-based analysis in whole-genome association studies, helping us better understand how genetics influences human diseases or traits, especially when the aggregated effects of multiple causal variants are present. In this paper, we propose a fast and effective algorithm coupling with high-dimensional test for simultaneously detecting multiple signal regions, which is distinct from existing methods using scan or knockoff statistics. The idea is to conduct binary splitting with re-search and arrangement based on a sequence of dynamic critical values to increase detection accuracy and reduce computation. Theoretical and empirical studies demonstrate that our approach enjoys favorable theoretical guarantees with fewer restrictions and exhibits superior numerical performance with faster computation. Utilizing the UK Biobank data to identify the genetic regions related to breast cancer, we confirm previous findings and meanwhile, identify a number of new regions which suggest strong association with risk of breast cancer and deserve further investigation.
△ Less
Submitted 8 February, 2024; v1 submitted 14 May, 2023;
originally announced May 2023.
-
Nonparametric regression for repeated measurements with deep neural networks
Authors:
Shunxing Yan,
Fang Yao
Abstract:
Analysis of repeated measurements for a sample of subjects has been intensively studied with several important branches developed, including longitudinal/panel/functional data analysis, while nonparametric regression of the mean function serves as a cornerstone that many statistical models are built upon. In this work, we investigate this problem using fully connected deep neural network (DNN) est…
▽ More
Analysis of repeated measurements for a sample of subjects has been intensively studied with several important branches developed, including longitudinal/panel/functional data analysis, while nonparametric regression of the mean function serves as a cornerstone that many statistical models are built upon. In this work, we investigate this problem using fully connected deep neural network (DNN) estimators with flexible shapes. A comprehensive theoretical framework is established by adopting empirical process techniques to tackle clustered dependence. We then derive the nearly optimal convergence rate of the DNN estimators in Hölder smoothness space, and illustrate the phase transition phenomenon inherent to repeated measurements and its connection to the curse of dimensionality. Furthermore, we study the function spaces with low intrinsic dimensions, including the hierarchical composition model, anisotropic Hölder smoothness and low-dimensional support set, and also obtain new approximation results and matching lower bounds to demonstrate the adaptivity of the DNN estimators for circumventing the curse of dimensionality.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
How Bad is Top-$K$ Recommendation under Competing Content Creators?
Authors:
Fan Yao,
Chuanhao Li,
Denis Nekipelov,
Hongning Wang,
Haifeng Xu
Abstract:
Content creators compete for exposure on recommendation platforms, and such strategic behavior leads to a dynamic shift over the content distribution. However, how the creators' competition impacts user welfare and how the relevance-driven recommendation influences the dynamics in the long run are still largely unknown.
This work provides theoretical insights into these research questions. We mo…
▽ More
Content creators compete for exposure on recommendation platforms, and such strategic behavior leads to a dynamic shift over the content distribution. However, how the creators' competition impacts user welfare and how the relevance-driven recommendation influences the dynamics in the long run are still largely unknown.
This work provides theoretical insights into these research questions. We model the creators' competition under the assumptions that: 1) the platform employs an innocuous top-$K$ recommendation policy; 2) user decisions follow the Random Utility model; 3) content creators compete for user engagement and, without knowing their utility function in hindsight, apply arbitrary no-regret learning algorithms to update their strategies. We study the user welfare guarantee through the lens of Price of Anarchy and show that the fraction of user welfare loss due to creator competition is always upper bounded by a small constant depending on $K$ and randomness in user decisions; we also prove the tightness of this bound. Our result discloses an intrinsic merit of the myopic approach to the recommendation, i.e., relevance-driven matching performs reasonably well in the long run, as long as users' decisions involve randomness and the platform provides reasonably many alternatives to its users.
△ Less
Submitted 2 May, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
A New WISE Calibration of Stellar Mass
Authors:
T. H. Jarrett,
M. E. Cluver,
Edward N. Taylor,
Sabine Bellstedt,
A. S. G Robotham,
H. F. M. Yao
Abstract:
We derive new empirical scaling relations between WISE mid-infrared galaxy photometry and well-determined stellar masses from SED modeling of a suite of optical-infrared photometry provided by the DR4 Catalogue of the GAMA-KiDS-VIKING survey of the southern G23 field. The mid-infrared source extraction and characterization are drawn from the WISE Extended Source Catalogue (WXSC) and the archival A…
▽ More
We derive new empirical scaling relations between WISE mid-infrared galaxy photometry and well-determined stellar masses from SED modeling of a suite of optical-infrared photometry provided by the DR4 Catalogue of the GAMA-KiDS-VIKING survey of the southern G23 field. The mid-infrared source extraction and characterization are drawn from the WISE Extended Source Catalogue (WXSC) and the archival ALLWISE catalog, combining both resolved and compact galaxies in the G23 sample to a redshift of 0.15. Three scaling relations are derived: W1 3.4 micron luminosity versus stellar mass, and WISE W1-W2, W1-W3 colors versus mass-to-light ratio (sensitive to a variety of galaxy types from passive to star-forming). For each galaxy in the sample, we then derive the combined stellar mass from these scaling relations, producing Mstellar estimates with better than $\sim$25-30% accuracy for galaxies with $>$10$^{9}$ Msolar and $<$40 - 50% for lower luminosity dwarf galaxies. We also provide simple prescriptions for rest-frame corrections and estimating stellar masses using only the W1 flux and the W1-W2 color, making stellar masses more accessible to users of the WISE data. Given a redshift or distance, these new scaling relations will enable stellar mass estimates for any galaxy in the sky detected by WISE with high fidelity across a range of mass-to-light.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments
Authors:
Ting Wang,
Zongkai Wu,
Feiyu Yao,
Donglin Wang
Abstract:
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment. The understanding of environments is a crucial part of the VLN-CE task, but existing methods are relatively simple and direct in understanding the environment, without delving into the relationship between language instructions…
▽ More
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment. The understanding of environments is a crucial part of the VLN-CE task, but existing methods are relatively simple and direct in understanding the environment, without delving into the relationship between language instructions and visual environments. Therefore, we propose a new environment representation in order to solve the above problems. First, we propose an Environment Representation Graph (ERG) through object detection to express the environment in semantic level. This operation enhances the relationship between language and environment. Then, the relational representations of object-object, object-agent in ERG are learned through GCN, so as to obtain a continuous expression about ERG. Sequentially, we combine the ERG expression with object label embeddings to obtain the environment representation. Finally, a new cross-modal attention navigation framework is proposed, incorporating our environment representation and a special loss function dedicated to training ERG. Experimental result shows that our method achieves satisfactory performance in terms of success rate on VLN-CE tasks. Further analysis explains that our method attains better cross-modal matching and strong generalization ability.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Connecting Euclidean to light-cone correlations: From flavor nonsinglet in forward kinematics to flavor singlet in non-forward kinematics
Authors:
Fei Yao,
Yao Ji,
Jian-Hui Zhang
Abstract:
We present a unified framework for the perturbative factorization connecting Euclidean correlations to light-cone correlations. Starting from nonlocal quark and gluon bilinear correlators, we derive the relevant hard-matching kernel up to the next-to-leading-order, both for the flavor singlet and non-singlet combinations, in non-forward and forward kinematics, and in coordinate and momentum space.…
▽ More
We present a unified framework for the perturbative factorization connecting Euclidean correlations to light-cone correlations. Starting from nonlocal quark and gluon bilinear correlators, we derive the relevant hard-matching kernel up to the next-to-leading-order, both for the flavor singlet and non-singlet combinations, in non-forward and forward kinematics, and in coordinate and momentum space. The results for the generalized distribution functions (GPDs), parton distribution functions (PDFs), and distribution amplitudes (DAs) are obtained by choosing appropriate kinematics. The renormalization and matching are done in a state-of-the-art scheme. We also clarify some issues raised on the perturbative matching of GPDs in the literature. Our results provide a complete manual for extracting all leading-twist GPDs, PDFs as well as DAs from lattice simulations of Euclidean correlations in a state-of-the-art strategy, either in coordinate or in momentum space factorization approach.
△ Less
Submitted 14 November, 2023; v1 submitted 29 December, 2022;
originally announced December 2022.
-
Precision measurement of reactor antineutrino oscillation at kilometer-scale baselines by Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
Y. Y. Ding,
X. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
We present a new determination of the smallest neutrino mixing angle $θ_{13}$ and the mass-squared difference $Δ{\rm m}^{2}_{32}$ using a final sample of $5.55 \times 10^{6}$ inverse beta-decay (IBD) candidates with the final-state neutron captured on gadolinium. This sample was selected from the complete data set obtained by the Daya Bay reactor neutrino experiment in 3158 days of operation. Comp…
▽ More
We present a new determination of the smallest neutrino mixing angle $θ_{13}$ and the mass-squared difference $Δ{\rm m}^{2}_{32}$ using a final sample of $5.55 \times 10^{6}$ inverse beta-decay (IBD) candidates with the final-state neutron captured on gadolinium. This sample was selected from the complete data set obtained by the Daya Bay reactor neutrino experiment in 3158 days of operation. Compared to the previous Daya Bay results, selection of IBD candidates has been optimized, energy calibration refined, and treatment of backgrounds further improved. The resulting oscillation parameters are ${\rm sin}^{2}2θ_{13} = 0.0851 \pm 0.0024$, $Δ{\rm m}^{2}_{32} = (2.466 \pm 0.060) \times 10^{-3}{\rm eV}^{2}$ for the normal mass ordering or $Δ{\rm m}^{2}_{32} = -(2.571 \pm 0.060) \times 10^{-3} {\rm eV}^{2}$ for the inverted mass ordering.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Theory of functional principal component analysis for discretely observed data
Authors:
Hang Zhou,
Dongyi Wei,
Fang Yao
Abstract:
Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturb…
▽ More
Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been substantial progress since Hall, Müller and Wang (2006)'s result for a fixed number of eigenfunction estimates. In this work, we aim to establish a unified theory for this problem, obtaining upper bounds for eigenfunctions with diverging indices in both the $\mathcal{L}^2$ and supremum norms, and deriving the asymptotic distributions of eigenvalues for a wide range of sampling schemes. Our results provide insight into the phenomenon when the $\mathcal{L}^{2}$ bound of eigenfunction estimates with diverging indices is minimax optimal as if the curves are fully observed, and reveal the transition of convergence rates from nonparametric to parametric regimes in connection to sparse or dense sampling. We also develop a double truncation technique to handle the uniform convergence of estimated covariance and eigenfunctions. The technical arguments in this work are useful for handling the perturbation series with noisy and discretely observed functional data and can be applied in models or those involving inverse problems based on FPCA as regularization, such as functional linear regression.
△ Less
Submitted 1 April, 2024; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Resumming Quark's Longitudinal Momentum Logarithms in LaMET Expansion of Lattice PDFs
Authors:
Yushan Su,
Jack Holligan,
Xiangdong Ji,
Fei Yao,
Jian-Hui Zhang,
Rui Zhang
Abstract:
In the large-momentum expansion for parton distribution functions (PDFs), the natural physics scale is the longitudinal momentum ($p_z$) of the quarks (or gluons) in a large-momentum hadron. We show how to expose this scale dependence through resumming logarithms of the type $\ln^n p_z/μ$ in the matching coefficient, where $μ$ is a fixed renormalization scale. The result enhances the accuracy of t…
▽ More
In the large-momentum expansion for parton distribution functions (PDFs), the natural physics scale is the longitudinal momentum ($p_z$) of the quarks (or gluons) in a large-momentum hadron. We show how to expose this scale dependence through resumming logarithms of the type $\ln^n p_z/μ$ in the matching coefficient, where $μ$ is a fixed renormalization scale. The result enhances the accuracy of the expansion at moderate $p_z>1$ GeV, and at the same time, clearly shows that the partons cannot be approximated from quarks with $p_z\sim Λ_{\rm QCD}$ which are not predominantly collinear with the parent hadron momentum, consistent with power counting of the large-momentum effective theory. The same physics mechanism constrains the coordinate space expansion at large distances $z$, the conjugate of $p_z$, as illustrated in the example of fitting the moments of the PDFs.
△ Less
Submitted 29 March, 2023; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Nucleon Transversity Distribution in the Continuum and Physical Mass Limit from Lattice QCD
Authors:
Fei Yao,
Lisa Walter,
Jiunn-Wei Chen,
Jun Hua,
Xiangdong Ji,
Luchang Jin,
Sebastian Lahrtz,
Lingquan Ma,
Protick Mohanta,
Andreas Schäfer,
Hai-Tao Shu,
Yushan Su,
Peng Sun,
Xiaonu Xiong,
Yi-Bo Yang,
Jian-Hui Zhang
Abstract:
We report a state-of-the-art lattice QCD calculation of the isovector quark transversity distribution of the proton in the continuum and physical mass limit using large-momentum effective theory. The calculation is done at four lattice spacings $a=\{0.098,0.085,0.064,0.049\}$~fm and various pion masses ranging between $220$ and $350$ MeV, with proton momenta up to $2.8$ GeV. The result is non-pert…
▽ More
We report a state-of-the-art lattice QCD calculation of the isovector quark transversity distribution of the proton in the continuum and physical mass limit using large-momentum effective theory. The calculation is done at four lattice spacings $a=\{0.098,0.085,0.064,0.049\}$~fm and various pion masses ranging between $220$ and $350$ MeV, with proton momenta up to $2.8$ GeV. The result is non-perturbatively renormalized in the hybrid scheme with self renormalization which treats the infrared physics at large correlation distance properly, and extrapolated to the continuum, physical mass and infinite momentum limit. We also compare with recent global analyses for the nucleon isovector quark transversity distribution.
△ Less
Submitted 24 February, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Connecting MeerKAT radio continuum properties to GAMA optical emission-line and WISE mid-infrared activity
Authors:
H. F. M. Yao,
M. E. Cluver,
T. H. Jarrett,
Gyula I. G. Jozsa,
M. G. Santos,
L. Marchetti,
M. J. I. Brown,
Y. A. Gordon,
S. Brough,
A. M. Hopkins,
B. W. Holwerda,
S. P. Driver,
E. M. Sadler
Abstract:
The identification of AGN in large surveys has been hampered by seemingly discordant classifications arising from differing diagnostic methods, usually tracing distinct processes specific to a particular wavelength regime. However, as shown in Yao et al. (2020), the combination of optical emission line measurements and mid-infrared photometry can be used to optimise the discrimination capability b…
▽ More
The identification of AGN in large surveys has been hampered by seemingly discordant classifications arising from differing diagnostic methods, usually tracing distinct processes specific to a particular wavelength regime. However, as shown in Yao et al. (2020), the combination of optical emission line measurements and mid-infrared photometry can be used to optimise the discrimination capability between AGN and star formation activity. In this paper we test our new classification scheme by combining the existing GAMA-WISE data with high-quality MeerKAT radio continuum data covering 8 deg$^2$ of the GAMA G23 region. Using this sample of 1 841 galaxies (z < 0.25), we investigate the total infrared (derived from 12$μ$m) to radio luminosity ratio, q(TIR), and its relationship to optical-infrared AGN and star-forming (SF) classifications. We find that while q(TIR) is efficient at detecting AGN activity in massive galaxies generally appearing quiescent in the infrared, it becomes less reliable for cases where the emission from star formation in the host galaxy is dominant. However, we find that the q(TIR) can identify up to 70 % more AGNs not discernible at optical and/or infrared wavelengths. The median q(TIR) of our SF sample is 2.57 $\pm$ 0.23 consistent with previous local universe estimates.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Re-thinking and Re-labeling LIDC-IDRI for Robust Pulmonary Cancer Prediction
Authors:
Hanxiao Zhang,
Xiao Gu,
Minghui Zhang,
Weihao Yu,
Liang Chen,
Zhexin Wang,
Feng Yao,
Yun Gu,
Guang-Zhong Yang
Abstract:
The LIDC-IDRI database is the most popular benchmark for lung cancer prediction. However, with subjective assessment from radiologists, nodules in LIDC may have entirely different malignancy annotations from the pathological ground truth, introducing label assignment errors and subsequent supervision bias during training. The LIDC database thus requires more objective labels for learning-based can…
▽ More
The LIDC-IDRI database is the most popular benchmark for lung cancer prediction. However, with subjective assessment from radiologists, nodules in LIDC may have entirely different malignancy annotations from the pathological ground truth, introducing label assignment errors and subsequent supervision bias during training. The LIDC database thus requires more objective labels for learning-based cancer prediction. Based on an extra small dataset containing 180 nodules diagnosed by pathological examination, we propose to re-label LIDC data to mitigate the effect of original annotation bias verified on this robust benchmark. We demonstrate in this paper that providing new labels by similar nodule retrieval based on metric learning would be an effective re-labeling strategy. Training on these re-labeled LIDC nodules leads to improved model performance, which is enhanced when new labels of uncertain nodules are added. We further infer that re-labeling LIDC is current an expedient way for robust lung cancer prediction while building a large pathological-proven nodule database provides the long-term solution.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Band Gap Opening in Bilayer Graphene-CrCl$_3$/CrBr$_3$/CrI$_3$ van der Waals Interfaces
Authors:
Giulia Tenasini,
David Soler-Delgado,
Zhe Wang,
Fengrui Yao,
Dumitru Dumcenco,
Enrico Giannini,
Kenji Watanabe,
Takashi Taniguchi,
Christian Moulsdale,
Aitor Garcia-Ruiz,
Vladimir I. Fal'ko,
Ignacio Gutiérrez-Lezama,
Alberto F. Morpurgo
Abstract:
We report experimental investigations of transport through bilayer graphene (BLG)/chromium trihalide (CrX$_3$; X=Cl, Br, I) van der Waals interfaces. In all cases, a large charge transfer from BLG to CrX$_3$ takes place (reaching densities in excess of $10^{13}$ cm$^{-2}$), and generates an electric field perpendicular to the interface that opens a band gap in BLG. We determine the gap from the ac…
▽ More
We report experimental investigations of transport through bilayer graphene (BLG)/chromium trihalide (CrX$_3$; X=Cl, Br, I) van der Waals interfaces. In all cases, a large charge transfer from BLG to CrX$_3$ takes place (reaching densities in excess of $10^{13}$ cm$^{-2}$), and generates an electric field perpendicular to the interface that opens a band gap in BLG. We determine the gap from the activation energy of the conductivity and find excellent agreement with the latest theory accounting for the contribution of the $σ$ bands to the BLG dielectric susceptibility. We further show that for BLG/CrCl$_3$ and BLG/CrBr$_3$ the band gap can be extracted from the gate voltage dependence of the low-temperature conductivity, and use this finding to refine the gap dependence on the magnetic field. Our results allow a quantitative comparison of the electronic properties of BLG with theoretical predictions and indicate that electrons occupying the CrX$_3$ conduction band are correlated.
△ Less
Submitted 24 August, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.