Search | arXiv e-print repository

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Authors: Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Aviv Regev, Sergey Levine, Masatoshi Uehara

Abstract: Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, class… ▽ More Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (\textit{e.g.}, classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation. The code is available at \href{https://github.com/masa-ue/SVDD}{https://github.com/masa-ue/SVDD}. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: The code is available at https://github.com/masa-ue/SVDD

arXiv:2408.03415 [pdf, other]

A Novel Approximate Bayesian Inference Method for Compartmental Models in Epidemiology using Stan

Authors: Xiahui Li, Ben Swallow, Fergus J. Chadwick

Abstract: Mechanistic compartmental models are widely used in epidemiology to study the dynamics of infectious disease transmission. These models have significantly contributed to designing and evaluating effective control strategies during pandemics. However, the increasing complexity and the number of parameters needed to describe rapidly evolving transmission scenarios present significant challenges for… ▽ More Mechanistic compartmental models are widely used in epidemiology to study the dynamics of infectious disease transmission. These models have significantly contributed to designing and evaluating effective control strategies during pandemics. However, the increasing complexity and the number of parameters needed to describe rapidly evolving transmission scenarios present significant challenges for parameter estimation due to intractable likelihoods. To overcome this issue, likelihood-free methods have proven effective for accurately and efficiently fitting these models to data. In this study, we focus on approximate Bayesian computation (ABC) and synthetic likelihood methods for parameter inference. We develop a method that employs ABC to select the most informative subset of summary statistics, which are then used to construct a synthetic likelihood for posterior sampling. Posterior sampling is performed using Hamiltonian Monte Carlo as implemented in the Stan software. The proposed algorithm is demonstrated through simulation studies, showing promising results for inference in a simulated epidemic scenario. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2408.02045 [pdf, other]

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

Authors: Qinshuo Liu, Zixin Wang, Xi-An Li, Xinyao Ji, Lei Zhang, Lin Liu, Zhonghua Liu

Abstract: Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial o… ▽ More Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial or spline approximations, are difficult to scale to multi-dimensional problems. Alternatively, statisticians may choose to approximate the original integral equations by ones with closed-form solutions, resulting in computationally more efficient, but statistically suboptimal or even incorrect procedures. To bridge this gap, we propose a novel framework by formulating the semiparametric estimation problem as a bi-level optimization problem; and then we develop a scalable algorithm called Deep Neural-Nets Assisted Semiparametric Estimation (DNA-SE) by leveraging the universal approximation property of Deep Neural-Nets (DNN) to streamline semiparametric procedures. Through extensive numerical experiments and a real data analysis, we demonstrate the numerical and statistical advantages of $\dnase$ over traditional methods. To the best of our knowledge, we are the first to bring DNN into semiparametric statistics as a numerical solver of integral equations in our proposed general framework. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: semiparametric statistics, missing data, causal inference, Fredholm integral equations of the second kind, bi-level optimization, deep learning, AI for science

arXiv:2407.19373 [pdf, other]

Uncertainty Quantification of Data Shapley via Statistical Inference

Authors: Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, Xiangyu Chang

Abstract: As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications whe… ▽ More As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications where data constantly evolves and expands. This paper establishes the relationship between Data Shapley and infinite-order U-statistics and addresses this limitation by quantifying the uncertainty of Data Shapley with changes in data distribution from the perspective of U-statistics. We make statistical inferences on data valuation to obtain confidence intervals for the estimations. We construct two different algorithms to estimate this uncertainty and provide recommendations for their applicable situations. We also conduct a series of experiments on various datasets to verify asymptotic normality and propose a practical trading scenario enabled by this method. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.17719 [pdf]

A new moment-independent uncertainty importance measure based on cumulative residual entropy for developing uncertainty reduction strategies

Authors: Shi-Shun Chen, Xiao-Yang Li

Abstract: Uncertainty reduction is vital for improving system reliability and reducing risks. To identify the best target for uncertainty reduction, uncertainty importance measure is commonly used to prioritize the significance of input variable uncertainties. Then, designers will take steps to reduce the uncertainties of variables with high importance. However, for variables with minimal uncertainty, the c… ▽ More Uncertainty reduction is vital for improving system reliability and reducing risks. To identify the best target for uncertainty reduction, uncertainty importance measure is commonly used to prioritize the significance of input variable uncertainties. Then, designers will take steps to reduce the uncertainties of variables with high importance. However, for variables with minimal uncertainty, the cost of controlling their uncertainties can be unacceptable. Therefore, uncertainty magnitude should also be considered in developing uncertainty reduction strategies. Although variance-based methods have been developed for this purpose, they are dependent on statistical moments and have limitations when dealing with highly-skewed distributions that are commonly encountered in practical applications. Motivated by this problem, we propose a new uncertainty importance measure based on cumulative residual entropy. The proposed measure is moment-independent based on the cumulative distribution function, which can handle the highly-skewed distributions properly. Numerical implementations for estimating the proposed measure are devised and verified. A real-world engineering case considering highly-skewed distributions is introduced to show the procedure of developing uncertainty reduction strategies considering uncertainty magnitude and corresponding cost. The results demonstrate that the proposed measure can present a different uncertainty reduction recommendation compared to the variance-based approach because of its moment-independent characteristic. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.17718 [pdf]

Comparison of global sensitivity analysis methods for a fire spread model with a segmented characteristic

Authors: Shi-Shun Chen, Xiao-Yang Li

Abstract: Global sensitivity analysis (GSA) can provide rich information for controlling output uncertainty. In practical applications, segmented models are commonly used to describe an abrupt model change. For segmented models, the complicated uncertainty propagation during the transition region may lead to different importance rankings of different GSA methods. If an unsuitable GSA method is applied, misl… ▽ More Global sensitivity analysis (GSA) can provide rich information for controlling output uncertainty. In practical applications, segmented models are commonly used to describe an abrupt model change. For segmented models, the complicated uncertainty propagation during the transition region may lead to different importance rankings of different GSA methods. If an unsuitable GSA method is applied, misleading results will be obtained, resulting in suboptimal or even wrong decisions. In this paper, four GSA indices, i.e., Sobol index, mutual information, delta index and PAWN index, are applied for a segmented fire spread model (Dry Eucalypt). The results show that four GSA indices give different importance rankings during the transition region since segmented characteristics affect different GSA indices in different ways. We suggest that analysts should rely on the results of different GSA indices according to their practical purpose, especially when making decisions for segmented models during the transition region. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.16739 [pdf, other]

Forecasting Automotive Supply Chain Shortfalls with Heterogeneous Time Series

Authors: Bach Viet Do, Xingyu Li, Chaoye Pan, Oleg Gusikhin

Abstract: Operational disruptions can significantly impact companies performance. Ford, with its 37 plants globally, uses 17 billion parts annually to manufacture six million cars and trucks. With up to ten tiers of suppliers between the company and raw materials, any extended disruption in this supply chain can cause substantial financial losses. Therefore, the ability to forecast and identify such disrupt… ▽ More Operational disruptions can significantly impact companies performance. Ford, with its 37 plants globally, uses 17 billion parts annually to manufacture six million cars and trucks. With up to ten tiers of suppliers between the company and raw materials, any extended disruption in this supply chain can cause substantial financial losses. Therefore, the ability to forecast and identify such disruptions early is crucial for maintaining seamless operations. In this study, we demonstrate how we construct a dataset consisting of many multivariate time series to forecast first-tier supply chain disruptions, utilizing features related to capacity, inventory, utilization, and processing, as outlined in the classical Factory Physics framework. This dataset is technically challenging due to its vast scale of over five hundred thousand time series. Furthermore, these time series, while exhibiting certain similarities, also display heterogeneity within specific subgroups. To address these challenges, we propose a novel methodology that integrates an enhanced Attention Sequence to Sequence Deep Learning architecture, using Neural Network Embeddings to model group effects, with a Survival Analysis model. This model is designed to learn intricate heterogeneous data patterns related to operational disruptions. Our model has demonstrated a strong performance, achieving 0.85 precision and 0.8 recall during the Quality Assurance (QA) phase across Ford's five North American plants. Additionally, to address the common criticism of Machine Learning models as black boxes, we show how the SHAP framework can be used to generate feature importance from the model predictions. It offers valuable insights that can lead to actionable strategies and highlights the potential of advanced machine learning for managing and mitigating supply chain risks in the automotive industry. △ Less

Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.13261 [pdf, other]

Enhanced inference for distributions and quantiles of individual treatment effects in various experiments

Authors: Zhe Chen, Xinran Li

Abstract: Understanding treatment effect heterogeneity has become increasingly important in many fields. In this paper we study distributions and quantiles of individual treatment effects to provide a more comprehensive and robust understanding of treatment effects beyond usual averages, despite they are more challenging to infer due to nonidentifiability from observed data. Recent randomization-based appro… ▽ More Understanding treatment effect heterogeneity has become increasingly important in many fields. In this paper we study distributions and quantiles of individual treatment effects to provide a more comprehensive and robust understanding of treatment effects beyond usual averages, despite they are more challenging to infer due to nonidentifiability from observed data. Recent randomization-based approaches offer finite-sample valid inference for treatment effect distributions and quantiles in both completely randomized and stratified randomized experiments, but can be overly conservative by assuming the worst-case scenario where units with large effects are all assigned to the treated (or control) group. We introduce two improved methods to enhance the power of these existing approaches. The first method reinterprets existing approaches as inferring treatment effects among only treated or control units, and then combines the inference for treated and control units to infer treatment effects for all units. The second method explicitly controls for the actual number of treated units with large effects. Both simulation and applications demonstrate the substantial gain from the improved methods. These methods are further extended to sampling-based experiments as well as quasi-experiments from matching, in which the ideas for both improved methods play critical and complementary roles. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2406.18137 [pdf, ps, other]

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Authors: Dongya Wu, Xin Li

Abstract: Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the conver… ▽ More Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $\ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $\mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $\mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.06980 [pdf, other]

Sensitivity Analysis for the Test-Negative Design

Authors: Soumyabrata Kundu, Peng Ding, Xinran Li, Jingshu Wang

Abstract: The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the va… ▽ More The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the vaccination and infection. Hence, the design has been employed routinely to monitor seasonal flu vaccines and more recently to measure the COVID-19 vaccine effectiveness. Despite its popularity, the design has been questioned, in particular about its ability to fully control for the unmeasured confounding. In this paper, we explore deviations from a perfect test-negative design, and propose various sensitivity analysis methods for estimating the effect of vaccination measured by the causal odds ratio on the subpopulation of individuals with good health-care-seeking behavior. We start with point identification of the causal odds ratio under a test-negative design, considering two forms of assumptions on the unmeasured confounder. These assumptions then lead to two approaches for conducting sensitivity analysis, addressing the influence of the unmeasured confounding in different ways. Specifically, one approach investigates partial control for unmeasured confounder in the test-negative design, while the other examines the impact of unmeasured confounder on both vaccination and infection. Furthermore, these approaches can be combined to provide narrower bounds on the true causal odds ratio, and can be further extended to sharpen the bounds by restricting the treatment effect heterogeneity. Finally, we apply the proposed methods to evaluate the effectiveness of COVID-19 vaccines using observational data from test-negative designs. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05855 [pdf, other]

Self-Distilled Disentangled Learning for Counterfactual Prediction

Authors: Xinshu Li, Mingming Gong, Lina Yao

Abstract: The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenari… ▽ More The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders. △ Less

Submitted 14 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05637 [pdf, ps, other]

A Generalized Version of Chung's Lemma and its Applications

Authors: Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu

Abstract: Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad… ▽ More Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 43 pages, 5 figures

MSC Class: 90C15; 90C30; 90C26

arXiv:2406.05340 [pdf, other]

Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models

Authors: Yucheng Liu, Xiaodong Li

Abstract: We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function.… ▽ More We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function. Our method of selection the number of communities is based on a sequential testing framework, in each step the weighed DCSBM is fitted via some spectral clustering method. A key step is to carry out matrix scaling on the estimated variance profile matrix. The resulting scaling factors can be used to normalize the adjacency matrix, from which the testing statistic is obtained. Under mild conditions on the weighted DCSBM, our proposed procedure is shown to be consistent in estimating the true number of communities. Numerical experiments on both simulated and real network data also demonstrate the desirable empirical properties of our method. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 3 figures, 2 tables

arXiv:2406.01653 [pdf, other]

An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks

Authors: Mingtao Xia, Xiangting Li, Qijing Shen, Tom Chou

Abstract: We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. The… ▽ More We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications. △ Less

Submitted 3 June, 2024; originally announced June 2024.

MSC Class: 60G07; 60J76

arXiv:2405.18373 [pdf, other]

A Hessian-Aware Stochastic Differential Equation for Modelling SGD

Authors: Xiang Li, Zebang Shen, Liang Zhang, Niao He

Abstract: Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these behaviors, even for simple quadratic objectives. Built on a novel stochastic backward error analysis framework, we derive the Hessian-Aware Stochastic Modified Equatio… ▽ More Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these behaviors, even for simple quadratic objectives. Built on a novel stochastic backward error analysis framework, we derive the Hessian-Aware Stochastic Modified Equation (HA-SME), an SDE that incorporates Hessian information of the objective function into both its drift and diffusion terms. Our analysis shows that HA-SME matches the order-best approximation error guarantee among existing SDE models in the literature, while achieving a significantly reduced dependence on the smoothness parameter of the objective. Further, for quadratic objectives, under mild conditions, HA-SME is proved to be the first SDE model that recovers exactly the SGD dynamics in the distributional sense. Consequently, when the local landscape near a stationary point can be approximated by quadratics, HA-SME is expected to accurately predict the local escaping behaviors of SGD. △ Less

Submitted 5 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16859 [pdf, other]

Gaussian Mixture Model with Rare Events

Authors: Xuetong Li, Jing Zhou, Hansheng Wang

Abstract: We study here a Gaussian Mixture Model (GMM) with rare events data. In this case, the commonly used Expectation-Maximization (EM) algorithm exhibits extremely slow numerical convergence rate. To theoretically understand this phenomenon, we formulate the numerical convergence problem of the EM algorithm with rare events data as a problem about a contraction operator. Theoretical analysis reveals th… ▽ More We study here a Gaussian Mixture Model (GMM) with rare events data. In this case, the commonly used Expectation-Maximization (EM) algorithm exhibits extremely slow numerical convergence rate. To theoretically understand this phenomenon, we formulate the numerical convergence problem of the EM algorithm with rare events data as a problem about a contraction operator. Theoretical analysis reveals that the spectral radius of the contraction operator in this case could be arbitrarily close to 1 asymptotically. This theoretical finding explains the empirical slow numerical convergence of the EM algorithm with rare events data. To overcome this challenge, a Mixed EM (MEM) algorithm is developed, which utilizes the information provided by partially labeled data. As compared with the standard EM algorithm, the key feature of the MEM algorithm is that it requires additionally labeled data. We find that MEM algorithm significantly improves the numerical convergence rate as compared with the standard EM algorithm. The finite sample performance of the proposed method is illustrated by both simulation studies and a real-world dataset of Swedish traffic signs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15115 [pdf, other]

Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification

Authors: Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li

Abstract: Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation… ▽ More Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation $\mathbb{E}[Y|X]$ and the conditional variance Var$(Y|X)$. This additional uncertainty quantification objective provides a handle to (i) better design out-of-distribution experiments to distinguish ICL from in-weight learning (IWL) and (ii) make a better separation between the algorithms with and without using the prior information of the training distribution. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training distribution. Our method can be extended to other cases. Specifically, with the Transformer's context window $S$, we prove a generalization bound of $\tilde{\mathcal{O}}(\sqrt{\min\{S, T\}/(n T)})$ on $n$ tasks with sequences of length $T$, providing sharper analysis compared to previous results of $\tilde{\mathcal{O}}(\sqrt{1/n})$. Empirically, we illustrate that while the trained Transformer behaves as the Bayes-optimal solution as a natural consequence of supervised training in distribution, it does not necessarily perform a Bayesian inference when facing task shifts, in contrast to the \textit{equivalence} between these two proposed in many existing literature. We also demonstrate the trained Transformer's ICL ability over covariates shift and prompt-length shift and interpret them as a generalization over a meta distribution. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06779 [pdf, other]

Generalization Problems in Experiments Involving Multidimensional Decisions

Authors: Jiawei Fu, Xiaojun Li

Abstract: Can the causal effects estimated in experiment be generalized to real-world scenarios? This question lies at the heart of social science studies. External validity primarily assesses whether experimental effects persist across different settings, implicitly presuming the experiment's ecological validity-that is, the consistency of experimental effects with their real-life counterparts. However, we… ▽ More Can the causal effects estimated in experiment be generalized to real-world scenarios? This question lies at the heart of social science studies. External validity primarily assesses whether experimental effects persist across different settings, implicitly presuming the experiment's ecological validity-that is, the consistency of experimental effects with their real-life counterparts. However, we argue that this presumed consistency may not always hold, especially in experiments involving multidimensional decision processes, such as conjoint experiments. We introduce a formal model to elucidate how attention and salience effects lead to three types of inconsistencies between experimental findings and real-world phenomena: amplified effect magnitude, effect sign reversal, and effect importance reversal. We derive testable hypotheses from each theoretical outcome and test these hypotheses using data from various existing conjoint experiments and our own experiments. Drawing on our theoretical framework, we propose several recommendations for experimental design aimed at enhancing the generalizability of survey experiment findings. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.19292 [pdf, other]

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

Authors: Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

Abstract: This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic t… ▽ More This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19242 [pdf, other]

A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

arXiv:2404.17615 [pdf]

DeepVARMA: A Hybrid Deep Learning and VARMA Model for Chemical Industry Index Forecasting

Authors: Xiang Li, Hu Yang

Abstract: Since the chemical industry index is one of the important indicators to measure the development of the chemical industry, forecasting it is critical for understanding the economic situation and trends of the industry. Taking the multivariable nonstationary series-synthetic material index as the main research object, this paper proposes a new prediction model: DeepVARMA, and its variants Deep-VARMA… ▽ More Since the chemical industry index is one of the important indicators to measure the development of the chemical industry, forecasting it is critical for understanding the economic situation and trends of the industry. Taking the multivariable nonstationary series-synthetic material index as the main research object, this paper proposes a new prediction model: DeepVARMA, and its variants Deep-VARMA-re and DeepVARMA-en, which combine LSTM and VARMAX models. The new model firstly uses the deep learning model such as the LSTM remove the trends of the target time series and also learn the representation of endogenous variables, and then uses the VARMAX model to predict the detrended target time series with the embeddings of endogenous variables, and finally combines the trend learned by the LSTM and dependency learned by the VARMAX model to obtain the final predictive values. The experimental results show that (1) the new model achieves the best prediction accuracy by combining the LSTM encoding of the exogenous variables and the VARMAX model. (2) In multivariate non-stationary series prediction, DeepVARMA uses a phased processing strategy to show higher adaptability and accuracy compared to the traditional VARMA model as well as the machine learning models LSTM, RF and XGBoost. (3) Compared with smooth sequence prediction, the traditional VARMA and VARMAX models fluctuate more in predicting non-smooth sequences, while DeepVARMA shows more flexibility and robustness. This study provides more accurate tools and methods for future development and scientific decision-making in the chemical industry. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.08472 [pdf, other]

TSLANet: Rethinking Transformers for Time Series Representation Learning

Authors: Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Xiaoli Li

Abstract: Time series data, characterized by its intrinsic long and short-range dependencies, poses a unique challenge across analytical applications. While Transformer-based models excel at capturing long-range dependencies, they face limitations in noise sensitivity, computational efficiency, and overfitting with smaller datasets. In response, we introduce a novel Time Series Lightweight Adaptive Network… ▽ More Time series data, characterized by its intrinsic long and short-range dependencies, poses a unique challenge across analytical applications. While Transformer-based models excel at capturing long-range dependencies, they face limitations in noise sensitivity, computational efficiency, and overfitting with smaller datasets. In response, we introduce a novel Time Series Lightweight Adaptive Network (TSLANet), as a universal convolutional model for diverse time series tasks. Specifically, we propose an Adaptive Spectral Block, harnessing Fourier analysis to enhance feature representation and to capture both long-term and short-term interactions while mitigating noise via adaptive thresholding. Additionally, we introduce an Interactive Convolution Block and leverage self-supervised learning to refine the capacity of TSLANet for decoding complex temporal patterns and improve its robustness on different datasets. Our comprehensive experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection, showcasing its resilience and adaptability across a spectrum of noise levels and data sizes. The code is available at https://github.com/emadeldeen24/TSLANet. △ Less

Submitted 6 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted in ICML 2024

arXiv:2404.06013 [pdf, other]

Feel-Good Thompson Sampling for Contextual Dueling Bandits

Authors: Xuheng Li, Heyang Zhao, Quanquan Gu

Abstract: Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based… ▽ More Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based on posterior sampling has been developed in this setting, despite the empirical success observed in traditional contextual bandits. In this paper, we propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits. At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits. This term leverages the independence of the two selected arms, thereby avoiding a cross term in the analysis. We show that our algorithm achieves nearly minimax-optimal regret, i.e., $\tilde{\mathcal{O}}(d\sqrt T)$, where $d$ is the model dimension and $T$ is the time horizon. Finally, we evaluate our algorithm on synthetic data and observe that FGTS.CDB outperforms existing algorithms by a large margin. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 30 pages, 6 figures

arXiv:2404.05933 [pdf, other]

fastcpd: Fast Change Point Detection in R

Authors: Xingchi Li, Xianyang Zhang

Abstract: Change point analysis is concerned with detecting and locating structure breaks in the underlying model of a sequence of observations ordered by time, space or other variables. A widely adopted approach for change point analysis is to minimize an objective function with a penalty term on the number of change points. This framework includes several well-established procedures, such as the penalized… ▽ More Change point analysis is concerned with detecting and locating structure breaks in the underlying model of a sequence of observations ordered by time, space or other variables. A widely adopted approach for change point analysis is to minimize an objective function with a penalty term on the number of change points. This framework includes several well-established procedures, such as the penalized log-likelihood using the (modified) Bayesian information criterion (BIC) or the minimum description length (MDL). The resulting optimization problem can be solved in polynomial time by dynamic programming or its improved version, such as the Pruned Exact Linear Time (PELT) algorithm (Killick, Fearnhead, and Eckley 2012). However, existing computational methods often suffer from two primary limitations: (1) methods based on direct implementation of dynamic programming or PELT are often time-consuming for long data sequences due to repeated computation of the cost value over different segments of the data sequence; (2) state-of-the-art R packages do not provide enough flexibility for users to handle different change point settings and models. In this work, we present the fastcpd package, aiming to provide an efficient and versatile framework for change point detection in several commonly encountered settings. The core of our algorithm is built upon PELT and the sequential gradient descent method recently proposed by Zhang and Dawn (2023). We illustrate the usage of the fastcpd package through several examples, including mean/variance changes in a (multivariate) Gaussian sequence, parameter changes in regression models, structural breaks in ARMA/GARCH/VAR models, and changes in user-specified models. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 53 pages, 16 figures

arXiv:2404.05484 [pdf, other]

On Computational Modeling of Sleep-Wake Cycle

Authors: Xin Li

Abstract: Why do mammals need to sleep? Neuroscience treats sleep and wake as default and perturbation modes of the brain. It is hypothesized that the brain self-organizes neural activities without environmental inputs. This paper presents a new computational model of the sleep-wake cycle (SWC) for learning and memory. During the sleep mode, the memory consolidation by the thalamocortical system is abstract… ▽ More Why do mammals need to sleep? Neuroscience treats sleep and wake as default and perturbation modes of the brain. It is hypothesized that the brain self-organizes neural activities without environmental inputs. This paper presents a new computational model of the sleep-wake cycle (SWC) for learning and memory. During the sleep mode, the memory consolidation by the thalamocortical system is abstracted by a disentangling operator that maps context-dependent representations (CDR) to context-independent representations (CIR) for generalization. Such a disentangling operator can be mathematically formalized by an integral transform that integrates the context variable from CDR. During the wake mode, the memory formation by the hippocampal-neocortical system is abstracted by an entangling operator from CIR to CDR where the context is introduced by physical motion. When designed as inductive bias, entangled CDR linearizes the problem of unsupervised learning for sensory memory by direct-fit. The concatenation of disentangling and entangling operators forms a disentangling-entangling cycle (DEC) as the building block for sensorimotor learning. We also discuss the relationship of DEC and SWC to the perception-action cycle (PAC) for internal model learning and perceptual control theory for the ecological origin of natural languages. △ Less

Submitted 17 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.01245 [pdf, other]

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical effi… ▽ More Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key -- provided by the LLM to the verifier -- to enable controlling the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks -- one of which has been internally implemented at OpenAI -- and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments. △ Less

Submitted 28 August, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00474 [pdf, other]

Linguistic Calibration of Long-Form Generations

Authors: Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto

Abstract: Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form gen… ▽ More Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions. This definition enables a training framework where a supervised finetuning step bootstraps an LM to emit long-form generations with confidence statements such as "I estimate a 30% chance of..." or "I am certain that...", followed by a reinforcement learning step which rewards generations that enable a user to provide calibrated answers to related questions. We linguistically calibrate Llama 2 7B and find in automated and human evaluations of long-form generations that it is significantly more calibrated than strong finetuned factuality baselines with comparable accuracy. These findings generalize under significant domain shifts to scientific and biomedical questions and to an entirely held-out person biography generation task. Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making. △ Less

Submitted 4 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: ICML 2024. Code available at https://github.com/tatsu-lab/linguistic_calibration

arXiv:2403.13027 [pdf, other]

Towards Better Statistical Understanding of Watermarking LLMs

Authors: Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li

Abstract: In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better under… ▽ More In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11163 [pdf, ps, other]

doi 10.1080/24754269.2024.2343151

A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, Jing Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.02696 [pdf, ps, other]

Low-rank matrix estimation via nonconvex spectral regularized methods in errors-in-variables matrix regression

Authors: Xin Li, Dongya Wu

Abstract: High-dimensional matrix regression has been studied in various aspects, such as statistical properties, computational efficiency and application to specific instances including multivariate regression, system identification and matrix compressed sensing. Current studies mainly consider the idealized case that the covariate matrix is obtained without noise, while the more realistic scenario that th… ▽ More High-dimensional matrix regression has been studied in various aspects, such as statistical properties, computational efficiency and application to specific instances including multivariate regression, system identification and matrix compressed sensing. Current studies mainly consider the idealized case that the covariate matrix is obtained without noise, while the more realistic scenario that the covariates may always be corrupted with noise or missing data has received little attention. We consider the general errors-in-variables matrix regression model and proposed a unified framework for low-rank estimation based on nonconvex spectral regularization. Then in the statistical aspect, recovery bounds for any stationary points are provided to achieve statistical consistency. In the computational aspect, the proximal gradient method is applied to solve the nonconvex optimization problem and is proved to converge in polynomial time. Consequences for specific matrix compressed sensing models with additive noise and missing data are obtained via verifying corresponding regularity conditions. Finally, the performance of the proposed nonconvex estimation method is illustrated by numerical experiments. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.11858 [pdf, ps, other]

Stochastic Hessian Fittings with Lie Groups

Authors: Xi-Lin Li

Abstract: This paper studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton and natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and relia… ▽ More This paper studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton and natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, to a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions with a specific yet general enough Lie group. This discovery turns Hessian fitting into a well behaved optimization problem, and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large scale stochastic optimizations. △ Less

Submitted 14 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 13 pages, 6 figures, 3 tables

arXiv:2402.08602 [pdf, other]

Globally-Optimal Greedy Experiment Selection for Active Sequential Estimation

Authors: Xiaoou Li, Hongru Zhao

Abstract: Motivated by modern applications such as computerized adaptive testing, sequential rank aggregation, and heterogeneous data source selection, we study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data. The goal is to design experiment selection rules for more accurate model estimation. Greedy information-based experiment se… ▽ More Motivated by modern applications such as computerized adaptive testing, sequential rank aggregation, and heterogeneous data source selection, we study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data. The goal is to design experiment selection rules for more accurate model estimation. Greedy information-based experiment selection methods, optimizing the information gain for one-step ahead, have been employed in practice thanks to their computational convenience, flexibility to context or task changes, and broad applicability. However, statistical analysis is restricted to one-dimensional cases due to the problem's combinatorial nature and the seemingly limited capacity of greedy algorithms, leaving the multidimensional problem open. In this study, we close the gap for multidimensional problems. In particular, we propose adopting a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood estimator following these selection rules. This class encompasses both existing methods and introduces new methods with improved numerical efficiency. We prove that these methods produce consistent and asymptotically normal estimators. Additionally, within a decision theory framework, we establish that the proposed methods achieve asymptotic optimality when the risk measure aligns with the selection rule. We also conduct extensive numerical studies on both simulated and real data to illustrate the efficacy of the proposed methods. From a technical perspective, we devise new analytical tools to address theoretical challenges. These analytical tools are of independent theoretical interest and may be reused in related problems involving stochastic approximation and sequential designs. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.08151 [pdf, other]

Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification models

Authors: Joshua C Chang, Xiangting Li, Shixin Xu, Hao-Ren Yao, Julia Porcino, Carson Chow

Abstract: We introduce a set of gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. One can leverage this methodology for assessing model generalizability by for instance computing a LOO analogue to the AIC or computing LOO ROC/PRC curves and derived me… ▽ More We introduce a set of gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. One can leverage this methodology for assessing model generalizability by for instance computing a LOO analogue to the AIC or computing LOO ROC/PRC curves and derived metrics like the AUROC and AUPRC. By the calculus of variations and gradient flow, we derive two simple nonlinear single-step transformations that utilize gradient information to shift a model's pre-trained full-data posterior closer to the target LOO posterior predictive distributions. In doing so, the transformations stabilize importance weights. Because the transformations involve the gradient of the likelihood function, the resulting Monte Carlo integral depends on Jacobian determinants with respect to the model Hessian. We derive closed-form exact formulae for these Jacobian determinants in the cases of logistic regression and shallow ReLU-activated artificial neural networks, and provide a simple approximation that sidesteps the need to compute full Hessian matrices and their spectra. We test the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Submitted

arXiv:2402.05009 [pdf, other]

A Review on Trajectory Datasets on Advanced Driver Assistance System

Authors: Hang Zhou, Ke Ma, Xiaopeng Li

Abstract: This paper presents a comprehensive review of trajectory data of Advanced Driver Assistance System equipped-vehicle, with the aim of precisely model of Autonomous Vehicles (AVs) behavior. This study emphasizes the importance of trajectory data in the development of AV models, especially in car-following scenarios. We introduce and evaluate several datasets: the OpenACC Dataset, the Connected & Aut… ▽ More This paper presents a comprehensive review of trajectory data of Advanced Driver Assistance System equipped-vehicle, with the aim of precisely model of Autonomous Vehicles (AVs) behavior. This study emphasizes the importance of trajectory data in the development of AV models, especially in car-following scenarios. We introduce and evaluate several datasets: the OpenACC Dataset, the Connected & Autonomous Transportation Systems Laboratory Open Dataset, the Vanderbilt ACC Dataset, the Central Ohio Dataset, and the Waymo Open Dataset. Each dataset offers unique insights into AV behaviors, yet they share common challenges in terms of data availability, processing, and standardization. After a series of data cleaning, outlier removal and statistical analysis, this paper transforms datasets of varied formats into a uniform standard, thereby improving their applicability for modeling AV car-following behavior. Key contributions of this study include: 1. the transformation of all datasets into a unified standard format, enhancing their utility for broad research applications; 2. a comparative analysis of these datasets, highlighting their distinct characteristics and implications for car-following model development; 3. the provision of guidelines for future data collection projects, along with the open-source release of all processed data and code for use by the research community. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures

arXiv:2402.02701 [pdf, other]

Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

Authors: Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu

Abstract: Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best… ▽ More Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB). △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Part of this work is accepted as AAMAS 2024 extended abstract

arXiv:2401.16308 [pdf, ps, other]

A Comprehensive Study of Covid 19 in Florida

Authors: Julian Bennett, Lauren Eriksen, Xingjie Helen Li

Abstract: Within the likes of any highly contagious and unpredictable disease, lies a predictable and attainable growth rate that researchers can find in order to make logistical conclusions about that particular disease and its affected regions' counterparts. The foundation that researchers pull from when studying a particular disease and looking for its growth rate is the Susceptible-Infected-Removed (SIR… ▽ More Within the likes of any highly contagious and unpredictable disease, lies a predictable and attainable growth rate that researchers can find in order to make logistical conclusions about that particular disease and its affected regions' counterparts. The foundation that researchers pull from when studying a particular disease and looking for its growth rate is the Susceptible-Infected-Removed (SIR) model, presented by a series of differential equations. The issue with the SIR model lies not in its complexity, but actually its simplicity and lack of a potentially high-finite amount of factors; the limit being bounded by the amount of data available for that particular factor. Our research involves the application of multiple regressions to pinpoint and identify our Covid lockdown periods, followed by the modification of the SIR model. This involved creating new model approximations such as the time-delayed SIR model and the reinfected SIR model in order to take into account factors such as incubation and reinfection, and get the lowest error discrepancy as possible for our infection rate. We were able to conclude that the more factors that we took into account, our error rate became lower and our results became more accurate. We could also identify outlier Metros and draw certain conclusions on performance level and the reasons behind them. We then moved on to find correlations, if any, between the infection rates and outside factors. We looked at demographic and weather data to demonstrate whether correlations appeared. We found that there are a few factors with high correlations, including graduate education and low temperatures. △ Less

Submitted 1 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

MSC Class: 93A30

arXiv:2401.14142 [pdf, other]

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations

Authors: Xinyue Xu, Yi Qin, Lu Mi, Hao Wang, Xiaomeng Li

Abstract: Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correct… ▽ More Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets. △ Less

Submitted 26 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted by ICLR 2024

arXiv:2401.11354 [pdf, other]

Squared Wasserstein-2 Distance for Efficient Reconstruction of Stochastic Differential Equations

Authors: Mingtao Xia, Xiangting Li, Qijing Shen, Tom Chou

Abstract: We provide an analysis of the squared Wasserstein-2 ($W_2$) distance between two probability distributions associated with two stochastic differential equations (SDEs). Based on this analysis, we propose the use of a squared $W_2$ distance-based loss functions in the \textit{reconstruction} of SDEs from noisy data. To demonstrate the practicality of our Wasserstein distance-based loss functions, w… ▽ More We provide an analysis of the squared Wasserstein-2 ($W_2$) distance between two probability distributions associated with two stochastic differential equations (SDEs). Based on this analysis, we propose the use of a squared $W_2$ distance-based loss functions in the \textit{reconstruction} of SDEs from noisy data. To demonstrate the practicality of our Wasserstein distance-based loss functions, we performed numerical experiments that demonstrate the efficiency of our method in reconstructing SDEs that arise across a number of applications. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 37 pages, 5 figures

MSC Class: 60H10; 49Q22

arXiv:2401.07267 [pdf, other]

Inference for high-dimensional linear expectile regression with de-biased method

Authors: Xiang Li, Yu-Ning Li, Li-Xin Zhang, Jun Zhao

Abstract: In this paper, we address the inference problem in high-dimensional linear expectile regression. We transform the expectile loss into a weighted-least-squares form and apply a de-biased strategy to establish Wald-type tests for multiple constraints within a regularized framework. Simultaneously, we construct an estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension wi… ▽ More In this paper, we address the inference problem in high-dimensional linear expectile regression. We transform the expectile loss into a weighted-least-squares form and apply a de-biased strategy to establish Wald-type tests for multiple constraints within a regularized framework. Simultaneously, we construct an estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension with general amenable regularizers including Lasso and SCAD, and demonstrate its consistency through a new proof technique. We conduct simulation studies and real data applications to demonstrate the efficacy of our proposed test statistic in both homoscedastic and heteroscedastic scenarios. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: 34 pages

MSC Class: 62F05; 62F12; 62J12

arXiv:2401.06348 [pdf, other]

A Fully Bayesian Approach for Comprehensive Mapping of Magnitude and Phase Brain Activation in Complex-Valued fMRI Data

Authors: Zhengxin Wang, Daniel B. Rowe, Xinyi Li, D. Andrew Brown

Abstract: Functional magnetic resonance imaging (fMRI) plays a crucial role in neuroimaging, enabling the exploration of brain activity through complex-valued signals. These signals, composed of magnitude and phase, offer a rich source of information for understanding brain functions. Traditional fMRI analyses have largely focused on magnitude information, often overlooking the potential insights offered by… ▽ More Functional magnetic resonance imaging (fMRI) plays a crucial role in neuroimaging, enabling the exploration of brain activity through complex-valued signals. These signals, composed of magnitude and phase, offer a rich source of information for understanding brain functions. Traditional fMRI analyses have largely focused on magnitude information, often overlooking the potential insights offered by phase data. In this paper, we propose a novel fully Bayesian model designed for analyzing single-subject complex-valued fMRI (cv-fMRI) data. Our model, which we refer to as the CV-M&P model, is distinctive in its comprehensive utilization of both magnitude and phase information in fMRI signals, allowing for independent prediction of different types of activation maps. We incorporate Gaussian Markov random fields (GMRFs) to capture spatial correlations within the data, and employ image partitioning and parallel computation to enhance computational efficiency. Our model is rigorously tested through simulation studies, and then applied to a real dataset from a unilateral finger-tapping experiment. The results demonstrate the model's effectiveness in accurately identifying brain regions activated in response to specific tasks, distinguishing between magnitude and phase activation. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.04857 [pdf, other]

Transportation Marketplace Rate Forecast Using Signature Transform

Authors: Haotian Gu, Xin Guo, Timothy L. Jacobs, Philip Kaminsky, Xinyu Li

Abstract: Freight transportation marketplace rates are typically challenging to forecast accurately. In this work, we have developed a novel statistical technique based on signature transforms and have built a predictive and adaptive model to forecast these marketplace rates. Our technique is based on two key elements of the signature transform: one being its universal nonlinearity property, which linearize… ▽ More Freight transportation marketplace rates are typically challenging to forecast accurately. In this work, we have developed a novel statistical technique based on signature transforms and have built a predictive and adaptive model to forecast these marketplace rates. Our technique is based on two key elements of the signature transform: one being its universal nonlinearity property, which linearizes the feature space and hence translates the forecasting problem into linear regression, and the other being the signature kernel, which allows for comparing computationally efficiently similarities between time series data. Combined, it allows for efficient feature generation and precise identification of seasonality and regime switching in the forecasting process. An algorithm based on our technique has been deployed by Amazon trucking operations, with far superior forecast accuracy and better interpretability versus commercially available industry models, even during the COVID-19 pandemic and the Ukraine conflict. Furthermore, our technique is able to capture the influence of business cycles and the heterogeneity of the marketplace, improving prediction accuracy by more than fivefold, with an estimated annualized saving of \$50MM. △ Less

Submitted 14 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03893 [pdf, other]

Finite-Time Decoupled Convergence in Nonlinear Two-Time-Scale Stochastic Approximation

Authors: Yuze Han, Xiang Li, Zhihua Zhang

Abstract: In two-time-scale stochastic approximation (SA), two iterates are updated at varying speeds using different step sizes, with each update influencing the other. Previous studies in linear two-time-scale SA have found that the convergence rates of the mean-square errors for these updates are dependent solely on their respective step sizes, leading to what is referred to as decoupled convergence. How… ▽ More In two-time-scale stochastic approximation (SA), two iterates are updated at varying speeds using different step sizes, with each update influencing the other. Previous studies in linear two-time-scale SA have found that the convergence rates of the mean-square errors for these updates are dependent solely on their respective step sizes, leading to what is referred to as decoupled convergence. However, the possibility of achieving this decoupled convergence in nonlinear SA remains less understood. Our research explores the potential for finite-time decoupled convergence in nonlinear two-time-scale SA. We find that under a weaker Lipschitz condition, traditional analyses are insufficient for achieving decoupled convergence. This finding is further numerically supported by a counterexample. But by introducing an additional condition of nested local linearity, we show that decoupled convergence is still feasible, contingent on the appropriate choice of step sizes associated with smoothness parameters. Our analysis depends on a refined characterization of the matrix cross term between the two iterates and utilizes fourth-order moments to control higher-order approximation errors induced by the local linearity assumption. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.01064 [pdf, other]

Robust Inference for Multiple Predictive Regressions with an Application on Bond Risk Premia

Authors: Xiaosai Liao, Xinjue Li, Qingliang Fan

Abstract: We propose a robust hypothesis testing procedure for the predictability of multiple predictors that could be highly persistent. Our method improves the popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013; Kostakis et al., 2015) in that, besides addressing the two bias effects found in Hosseinkouchack and Demetrescu (2021), we find and deal with the variance-enlargement eff… ▽ More We propose a robust hypothesis testing procedure for the predictability of multiple predictors that could be highly persistent. Our method improves the popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013; Kostakis et al., 2015) in that, besides addressing the two bias effects found in Hosseinkouchack and Demetrescu (2021), we find and deal with the variance-enlargement effect. We show that two types of higher-order terms induce these distortion effects in the test statistic, leading to significant over-rejection for one-sided tests and tests in multiple predictive regressions. Our improved IVX-based test includes three steps to tackle all the issues above regarding finite sample bias and variance terms. Thus, the test statistics perform well in size control, while its power performance is comparable with the original IVX. Monte Carlo simulations and an empirical study on the predictability of bond risk premia are provided to demonstrate the effectiveness of the newly proposed approach. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.10920 [pdf, other]

Domain adaption and physical constrains transfer learning for shale gas production

Authors: Zhaozhong Yang, Liangjie Gou, Chao Min, Duo Yi, Xiaogang Li, Guoquan Wen

Abstract: Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptat… ▽ More Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptation and physical constraints. This methodology effectively employs historical data from the source domain to reduce negative transfer from the data distribution perspective, while also using physical constraints to build a robust and reliable prediction model that integrates various types of data. The methodology starts by dividing the production data from the source domain into multiple subdomains, thereby enhancing data diversity. It then uses Maximum Mean Discrepancy (MMD) and global average distance measures to decide on the feasibility of transfer. Through domain adaptation, we integrate all transferable knowledge, resulting in a more comprehensive target model. Lastly, by incorporating drilling, completion, and geological data as physical constraints, we develop a hybrid model. This model, a combination of a multi-layer perceptron (MLP) and a Transformer (Transformer-MLP), is designed to maximize interpretability. Experimental validation in China's southwestern region confirms the method's effectiveness. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.09393 [pdf]

Bi-scale Car-following Model Calibration for Corridor Based on Trajectory

Authors: Keke Long, Haotian Shi, Zhiwei Chen, Zhaohui Liang, Xiaopeng Li, Felipe de Souza

Abstract: The precise estimation of macroscopic traffic parameters, such as travel time and fuel consumption, is essential for the optimization of traffic management systems. Despite its importance, the comprehensive acquisition of vehicle trajectory data for the calculation of these macroscopic measures presents a challenge. To bridge this gap, this study aims to calibrate car-following models capable of p… ▽ More The precise estimation of macroscopic traffic parameters, such as travel time and fuel consumption, is essential for the optimization of traffic management systems. Despite its importance, the comprehensive acquisition of vehicle trajectory data for the calculation of these macroscopic measures presents a challenge. To bridge this gap, this study aims to calibrate car-following models capable of predicting both microscopic measures and macroscopic measures. We conduct a numerical analysis to trace the cumulative process of model prediction errors across various measurements, and our findings indicate that macroscopic measures encapsulate the accumulation of model errors. By incorporating macroscopic measures into vehicle model calibration, we can mitigate the impact of noise on microscopic data measurements. We compare three car-following model calibration methods: MiC (using microscopic measurements), MaC (using macroscopic measurements), and BiC (using both microscopic and macroscopic measurements): utilizing real-world trajectory data. The BiC method emerges as the most successful in reconstructing vehicle trajectories and accurately estimating travel time and fuel consumption, whereas the MiC method leads to overfitting and inaccurate macro-measurement predictions. This study underscores the importance of bi-scale calibration for precise traffic and energy consumption predictions, laying the groundwork for future research aimed at enhancing traffic management strategies. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.05757 [pdf, ps, other]

doi 10.1016/j.ipm.2023.103600

Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

Authors: Tianqianjin Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

Abstract: Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic… ▽ More Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 28 pages, 10 figures, 6 tables, accepted by Information Processing & Management

Journal ref: Information Processing & Management, 60 (2024) 1-21

arXiv:2312.02513 [pdf, other]

Asymptotic Theory of the Best-Choice Rerandomization using the Mahalanobis Distance

Authors: Yuhao Wang, Xinran Li

Abstract: Rerandomization, a design that utilizes pretreatment covariates and improves their balance between different treatment groups, has received attention recently in both theory and practice. There are at least two types of rerandomization that are used in practice: the first rerandomizes the treatment assignment until covariate imbalance is below a prespecified threshold; the second randomizes the tr… ▽ More Rerandomization, a design that utilizes pretreatment covariates and improves their balance between different treatment groups, has received attention recently in both theory and practice. There are at least two types of rerandomization that are used in practice: the first rerandomizes the treatment assignment until covariate imbalance is below a prespecified threshold; the second randomizes the treatment assignment multiple times and chooses the one with the best covariate balance. In this paper we will consider the second type of rerandomization, namely the best-choice rerandomization, whose theory and inference are still lacking in the literature. In particular, we will focus on the best-choice rerandomization that uses the Mahalanobis distance to measure covariate imbalance, which is one of the most commonly used imbalance measure for multivariate covariates and is invariant to affine transformations of covariates. We will study the large-sample repeatedly sampling properties of the best-choice rerandomization, allowing both the number of covariates and the number of tried complete randomizations to increase with the sample size. We show that the asymptotic distribution of the difference-in-means estimator is more concentrated around the true average treatment effect under rerandomization than under the complete randomization, and propose large-sample accurate confidence intervals for rerandomization that are shorter than that for the completely randomized experiment. We further demonstrate that, with moderate number of covariates and with the number of tried randomizations increasing polynomially with the sample size, the best-choice rerandomization can achieve the ideally optimal precision that one can expect even with perfectly balanced covariates. The developed theory and methods for rerandomization are also illustrated using real field experiments. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.15539 [pdf]

A Novel Human-Based Meta-Heuristic Algorithm: Dragon Boat Optimization

Authors: Xiang Li, Long Lan, Husam Lahza, Shaowu Yang, Shuihua Wang, Wenjing Yang, Hengzhu Liu, Yudong Zhang

Abstract: (Aim) Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. (Method) It models the unique behaviors of each crew member on the dragon boat during the race by introducing social psychology mechanisms (social… ▽ More (Aim) Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. (Method) It models the unique behaviors of each crew member on the dragon boat during the race by introducing social psychology mechanisms (social loafing, social incentive). Throughout this process, the focus is on the interaction and collaboration among the crew members, as well as their decision-making in different situations. During each iteration, DBO implements different state updating strategies. By modelling the crew's behavior and adjusting the state updating strategies, DBO is able to maintain high-performance efficiency. (Results) We have tested the DBO algorithm with 29 mathematical optimization problems and 2 structural design problems. (Conclusion) The experimental results demonstrate that DBO is competitive with state-of-the-art meta-heuristic algorithms as well as conventional methods. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.14222 [pdf, other]

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Authors: Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu

Abstract: Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest se… ▽ More Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 85 pages, 5 figures

arXiv:2311.14142 [pdf]

Retention in STEM: Factors Influencing Student Persistence and Employment

Authors: Linli Zhou, Damji Heo Stratton, Xin Li

Abstract: This study utilizes data from the Baccalaureate and Beyond Longitudinal Study to explore factors associated with the likelihood of students' employment in STEM fields one year after graduation. We examined various factors related to students' individual characteristics (e.g., gender, race, and financial situation), institutional experiences (e.g., major, academic standing, research involvement, in… ▽ More This study utilizes data from the Baccalaureate and Beyond Longitudinal Study to explore factors associated with the likelihood of students' employment in STEM fields one year after graduation. We examined various factors related to students' individual characteristics (e.g., gender, race, and financial situation), institutional experiences (e.g., major, academic standing, research involvement, internships, extracurricular activities, and undergraduate practicum), and institutional and national trends. The results indicate lower STEM employment likelihood for minority groups and students with academic probation. The findings also highlight the positive impact of undergraduate practicum and job relevance to major on STEM employment likelihood. On the contrary, career services were negatively associated with the likelihood of students' STEM occupation choice, suggesting potential shortcomings in STEM job preparation within these services. The study provides valuable insights and actionable recommendations for policymakers and educators seeking to increase diversity and inclusion in STEM fields, suggesting the need for more efficient and tailored educational interventions and curriculum development. △ Less

Submitted 24 June, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Journal ref: The Proceedings of the 19th Annual National Symposium on Student Retention, 2023

Showing 1–50 of 528 results for author: Li, X