Search | arXiv e-print repository

New User Event Prediction Through the Lens of Causal Inference

Authors: Henry Shaowu Yuchi, Shixiang Zhu, Li Dong, Yigit M. Arisoy, Matthew C. Spencer

Abstract: Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this… ▽ More Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this approach requires extensive data to fully understand user behavior, presenting challenges in modeling newcomers without historical knowledge. In this paper, we propose a novel discrete event prediction framework for new users through the lens of causal inference. Our method offers an unbiased prediction for new users without needing to know their categories. We treat the user event history as the ''treatment'' for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, with the new user model trained on an adjusted dataset where each event is re-weighted by its inverse propensity score. We demonstrate the superior performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon. △ Less

Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2403.17852 [pdf, other]

Counterfactual Fairness through Transforming Data Orthogonal to Bias

Authors: Shuyi Chen, Shixiang Zhu

Abstract: Machine learning models have shown exceptional prowess in solving complex issues across various domains. However, these models can sometimes exhibit biased decision-making, resulting in unequal treatment of different groups. Despite substantial research on counterfactual fairness, methods to reduce the impact of multivariate and continuous sensitive variables on decision-making outcomes are still… ▽ More Machine learning models have shown exceptional prowess in solving complex issues across various domains. However, these models can sometimes exhibit biased decision-making, resulting in unequal treatment of different groups. Despite substantial research on counterfactual fairness, methods to reduce the impact of multivariate and continuous sensitive variables on decision-making outcomes are still underdeveloped. We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB), which is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications. Our approach, based on the assumption of a jointly normal distribution within a structural causal model (SCM), demonstrates that counterfactual fairness can be achieved by ensuring the data is orthogonal to the observed sensitive variables. The OB algorithm is model-agnostic, making it applicable to a wide range of machine learning models and tasks. Additionally, it includes a sparse variant to improve numerical stability through regularization. Empirical evaluations on both simulated and real-world datasets, encompassing settings with both discrete and continuous sensitive variables, show that our methodology effectively promotes fairer outcomes without compromising accuracy. △ Less

Submitted 29 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2402.03167 [pdf, other]

Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

Authors: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

Abstract: Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. Howev… ▽ More Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, current decentralized SBO algorithms face challenges, including expensive inner-loop updates and unclear understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions compared to existing methods. Numerical experiments validate our theoretical findings. △ Less

Submitted 26 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 37 pages, 6 figures

arXiv:2401.18017 [pdf, ps, other]

Causal Discovery by Kernel Deviance Measures with Heterogeneous Transforms

Authors: Tim Tse, Zhitang Chen, Shengyu Zhu, Yue Liu

Abstract: The discovery of causal relationships in a set of random variables is a fundamental objective of science and has also recently been argued as being an essential component towards real machine intelligence. One class of causal discovery techniques are founded based on the argument that there are inherent structural asymmetries between the causal and anti-causal direction which could be leveraged in… ▽ More The discovery of causal relationships in a set of random variables is a fundamental objective of science and has also recently been argued as being an essential component towards real machine intelligence. One class of causal discovery techniques are founded based on the argument that there are inherent structural asymmetries between the causal and anti-causal direction which could be leveraged in determining the direction of causation. To go about capturing these discrepancies between cause and effect remains to be a challenge and many current state-of-the-art algorithms propose to compare the norms of the kernel mean embeddings of the conditional distributions. In this work, we argue that such approaches based on RKHS embeddings are insufficient in capturing principal markers of cause-effect asymmetry involving higher-order structural variabilities of the conditional distributions. We propose Kernel Intrinsic Invariance Measure with Heterogeneous Transform (KIIM-HT) which introduces a novel score measure based on heterogeneous transformation of RKHS embeddings to extract relevant higher-order moments of the conditional densities for causal discovery. Inference is made via comparing the score of each hypothetical cause-effect direction. Tests and comparisons on a synthetic dataset, a two-dimensional synthetic dataset and the real-world benchmark dataset Tübingen Cause-Effect Pairs verify our approach. In addition, we conduct a sensitivity analysis to the regularization parameter to faithfully compare previous work to our method and an experiment with trials on varied hyperparameter values to showcase the robustness of our algorithm. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2310.18449 [pdf, other]

Black-Box Optimization with Implicit Constraints for Public Policy

Authors: Wenqian Xing, Jungho Lee, Chong Liu, Shixiang Zhu

Abstract: Black-box optimization (BBO) has become increasingly relevant for tackling complex decision-making problems, especially in public policy domains such as police redistricting. However, its broader application in public policymaking is hindered by the complexity of defining feasible regions and the high-dimensionality of decisions. This paper introduces a novel BBO framework, termed as the Condition… ▽ More Black-box optimization (BBO) has become increasingly relevant for tackling complex decision-making problems, especially in public policy domains such as police redistricting. However, its broader application in public policymaking is hindered by the complexity of defining feasible regions and the high-dimensionality of decisions. This paper introduces a novel BBO framework, termed as the Conditional And Generative Black-box Optimization (CageBO). This approach leverages a conditional variational autoencoder to learn the distribution of feasible decisions, enabling a two-way mapping between the original decision space and a simplified, constraint-free latent space. The CageBO efficiently handles the implicit constraints often found in public policy applications, allowing for optimization in the latent space while evaluating objectives in the original space. We validate our method through a case study on large-scale police redistricting problems in Atlanta, Georgia. Our results reveal that our CageBO offers notable improvements in performance and efficiency compared to the baselines. △ Less

Submitted 16 August, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.03258 [pdf, other]

Assessing Electricity Service Unfairness with Transfer Counterfactual Learning

Authors: Song Wei, Xiangrui Kong, Alinson Santos Xavier, Shixiang Zhu, Yao Xie, Feng Qiu

Abstract: Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average caus… ▽ More Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average causal effect of a specific protected attribute. Subsequently, we use subgroup analysis to handle model heterogeneity and introduce a novel method for estimating counterfactual unfairness based on transfer learning, which helps to alleviate the data scarcity in each subgroup. In our numerical analysis, we apply our method to a unique large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages under both daily and post-disaster operations, and such discrimination is exacerbated under severe conditions. These findings suggest a widespread, systematic issue of injustice in the power service systems and emphasize the necessity for focused interventions in disadvantaged communities. △ Less

Submitted 24 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The preliminary version titled "Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets" is presented at NeurIPS 2023 Workshops on Causal Representation Learning (CRL) and Algorithmic Fairness through the Lens of Time (AFT); See v1

arXiv:2310.03218 [pdf, other]

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

Authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu

Abstract: Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progres… ▽ More Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.01885 [pdf, other]

QuantEase: Optimization-based Quantization for Language Models

Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

Abstract: With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f… ▽ More With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity. △ Less

Submitted 1 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2305.15742 [pdf, other]

doi 10.1145/3637528.3671950

Counterfactual Generative Models for Time-Varying Treatments

Authors: Shenghao Wu, Wenbin Zhou, Minshuo Chen, Shixiang Zhu

Abstract: Estimating the counterfactual outcome of treatment is essential for decision-making in public health and clinical science, among others. Often, treatments are administered in a sequential, time-varying manner, leading to an exponentially increased number of possible counterfactual outcomes. Furthermore, in modern applications, the outcomes are high-dimensional and conventional average treatment ef… ▽ More Estimating the counterfactual outcome of treatment is essential for decision-making in public health and clinical science, among others. Often, treatments are administered in a sequential, time-varying manner, leading to an exponentially increased number of possible counterfactual outcomes. Furthermore, in modern applications, the outcomes are high-dimensional and conventional average treatment effect estimation fails to capture disparities in individuals. To tackle these challenges, we propose a novel conditional generative framework capable of producing counterfactual samples under time-varying treatment, without the need for explicit density estimation. Our method carefully addresses the distribution mismatch between the observed and counterfactual distributions via a loss function based on inverse probability re-weighting, and supports integration with state-of-the-art conditional generative models such as the guided diffusion and conditional variational autoencoder. We present a thorough evaluation of our method using both synthetic and real-world data. Our results demonstrate that our method is capable of generating high-quality counterfactual samples and outperforms the state-of-the-art baselines. △ Less

Submitted 13 July, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Published at KDD'24

arXiv:2305.12569 [pdf, other]

Conditional Generative Modeling for High-dimensional Marked Temporal Point Processes

Authors: Zheng Dong, Zekai Fan, Shixiang Zhu

Abstract: Point processes offer a versatile framework for sequential event modeling. However, the computational challenges and constrained representational power of the existing point process models have impeded their potential for wider applications. This limitation becomes especially pronounced when dealing with event data that is associated with multi-dimensional or high-dimensional marks such as texts o… ▽ More Point processes offer a versatile framework for sequential event modeling. However, the computational challenges and constrained representational power of the existing point process models have impeded their potential for wider applications. This limitation becomes especially pronounced when dealing with event data that is associated with multi-dimensional or high-dimensional marks such as texts or images. To address this challenge, this study proposes a novel event-generation framework for modeling point processes with high-dimensional marks. We aim to capture the distribution of events without explicitly specifying the conditional intensity or probability density function. Instead, we use a conditional generator that takes the history of events as input and generates the high-quality subsequent event that is likely to occur given the prior observations. The proposed framework offers a host of benefits, including considerable representational power to capture intricate dynamics in multi- or even high-dimensional event space, as well as exceptional efficiency in learning the model and generating samples. Our numerical results demonstrate superior performance compared to other state-of-the-art baselines. △ Less

Submitted 14 February, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2212.08901 [pdf]

doi 10.1145/3318299.3318342

Learning with linear mixed model for group recommendation systems

Authors: Baode Gao, Guangpeng Zhan, Hanzhang Wang, Yiming Wang, Shengxin Zhu

Abstract: Accurate prediction of users' responses to items is one of the main aims of many computational advising applications. Examples include recommending movies, news articles, songs, jobs, clothes, books and so forth. Accurate prediction of inactive users' responses still remains a challenging problem for many applications. In this paper, we explore the linear mixed model in recommendation system. The… ▽ More Accurate prediction of users' responses to items is one of the main aims of many computational advising applications. Examples include recommending movies, news articles, songs, jobs, clothes, books and so forth. Accurate prediction of inactive users' responses still remains a challenging problem for many applications. In this paper, we explore the linear mixed model in recommendation system. The recommendation process is naturally modelled as the mixed process between objective effects (fixed effects) and subjective effects (random effects). The latent association between the subjective effects and the users' responses can be mined through the restricted maximum likelihood method. It turns out the linear mixed models can collaborate items' attributes and users' characteristics naturally and effectively. While this model cannot produce the most precisely individual level personalized recommendation, it is relative fast and accurate for group (users)/class (items) recommendation. Numerical examples on GroupLens benchmark problems are presented to show the effectiveness of this method. △ Less

Submitted 17 December, 2022; originally announced December 2022.

Comments: 5 pages, 9 figures, published

ACM Class: G.3

Journal ref: In Proceedings of the 2019 11th International Conference on Machine Learning and Computing (pp. 81-85) (2019, February)

arXiv:2210.16486 [pdf, other]

Learning Probabilistic Models from Generator Latent Spaces with Hat EBM

Authors: Mitch Hill, Erik Nijkamp, Jonathan Mitchell, Bo Pang, Song-Chun Zhu

Abstract: This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM). Our formulation posits that observed images are the sum of unobserved latent variables passed through the generator network and a residual random variable that spans the gap between the generator output and the image manifold. One can then define an EBM that includes the generator as part… ▽ More This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM). Our formulation posits that observed images are the sum of unobserved latent variables passed through the generator network and a residual random variable that spans the gap between the generator output and the image manifold. One can then define an EBM that includes the generator as part of its forward pass, which we call the Hat EBM. The model can be trained without inferring the latent variables of the observed data or calculating the generator Jacobian determinant. This enables explicit probabilistic modeling of the output distribution of any type of generator network. Experiments show strong performance of the proposed method on (1) unconditional ImageNet synthesis at 128x128 resolution, (2) refining the output of existing generators, and (3) learning EBMs that incorporate non-probabilistic generators. Code and pretrained models to reproduce our results are available at https://github.com/point0bar1/hat-ebm. △ Less

Submitted 12 January, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022 camera ready

arXiv:2206.08531 [pdf, ps, other]

Reframed GES with a Neural Conditional Dependence Measure

Authors: Xinwei Shen, Shengyu Zhu, Jiji Zhang, Shoubo Hu, Zhitang Chen

Abstract: In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal s… ▽ More In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal structure. We observe that in order to make the GES algorithm consistent in a nonparametric setting, it is not necessary to design a scoring metric that evaluates graphs. Instead, it suffices to plug in a consistent estimator of a measure of conditional dependence to guide the search. We therefore present a reframing of the GES algorithm, which is more flexible than the standard score-based version and readily lends itself to the nonparametric setting with a general measure of conditional dependence. In addition, we propose a neural conditional dependence (NCD) measure, which utilizes the expressive power of deep neural networks to characterize conditional independence in a nonparametric manner. We establish the optimality of the reframed GES algorithm under standard assumptions and the consistency of using our NCD estimator to decide conditional independence. Together these results justify the proposed approach. Experimental results demonstrate the effectiveness of our method in causal discovery, as well as the advantages of using our NCD measure over kernel-based measures. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted to UAI 2022

arXiv:2205.12243 [pdf, other]

EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

Authors: Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu

Abstract: This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories. MCMC trajectories of different lengths correspond to models with different purposes. Our experiments cover three different trajectory magnitudes and learning outcomes: 1) shortrun sampling for image generation; 2) midrun sampling for classifier-agnostic adversarial… ▽ More This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories. MCMC trajectories of different lengths correspond to models with different purposes. Our experiments cover three different trajectory magnitudes and learning outcomes: 1) shortrun sampling for image generation; 2) midrun sampling for classifier-agnostic adversarial defense; and 3) longrun sampling for principled modeling of image probability densities. To achieve these outcomes, we introduce three novel methods of MCMC initialization for negative samples used in Maximum Likelihood (ML) learning. With standard network architectures and an unaltered ML objective, our MCMC initialization methods alone enable significant performance gains across the three applications that we investigate. Our results include state-of-the-art FID scores for unnormalized image densities on the CIFAR-10 and ImageNet datasets; state-of-the-art adversarial defense on CIFAR-10 among purification methods and the first EBM defense on ImageNet; and scalable techniques for learning valid probability densities. Code for this project can be found at https://github.com/point0bar1/ebm-life-cycle. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2203.11528 [pdf, other]

Out-of-distribution Generalization with Causal Invariant Transformations

Authors: Ruoyu Wang, Mingyang Yi, Zhitang Chen, Shengyu Zhu

Abstract: In real-world applications, it is important and desirable to learn a model that performs well on out-of-distribution (OOD) data. Recently, causality has become a powerful tool to tackle the OOD generalization problem, with the idea resting on the causal mechanism that is invariant across domains of interest. To leverage the generally unknown causal mechanism, existing works assume a linear form of… ▽ More In real-world applications, it is important and desirable to learn a model that performs well on out-of-distribution (OOD) data. Recently, causality has become a powerful tool to tackle the OOD generalization problem, with the idea resting on the causal mechanism that is invariant across domains of interest. To leverage the generally unknown causal mechanism, existing works assume a linear form of causal feature or require sufficiently many and diverse training domains, which are usually restrictive in practice. In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature. Our approach is based on transformations that modify the non-causal feature but leave the causal part unchanged, which can be either obtained from prior knowledge or learned from the training data in the multi-domain scenario. Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model across the domains using only single domain data. Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations. Based on the theoretical findings, a regularized training procedure is proposed to improve the OOD generalization capability. Extensive experimental results on both synthetic and real datasets verify the effectiveness of the proposed algorithm, even with only a few causal invariant transformations. △ Less

Submitted 23 March, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: accepted by cvpr2022

arXiv:2111.15155 [pdf, other]

gCastle: A Python Toolbox for Causal Discovery

Authors: Keli Zhang, Shengyu Zhu, Marcus Kalander, Ignavier Ng, Junjian Ye, Zhitang Chen, Lujia Pan

Abstract: $\texttt{gCastle}… ▽ More $\texttt{gCastle}$ is an end-to-end Python toolbox for causal structure learning. It provides functionalities of generating data from either simulator or real-world dataset, learning causal structure from the data, and evaluating the learned graph, together with useful practices such as prior knowledge insertion, preliminary neighborhood selection, and post-processing to remove false discoveries. Compared with related packages, $\texttt{gCastle}$ includes many recently developed gradient-based causal discovery methods with optional GPU acceleration. $\texttt{gCastle}$ brings convenience to researchers who may directly experiment with the code as well as practitioners with graphical user interference. Three real-world datasets in telecommunications are also provided in the current version. $\texttt{gCastle}$ is available under Apache License 2.0 at \url{https://github.com/huawei-noah/trustworthyAI/tree/master/gcastle}. △ Less

Submitted 30 November, 2021; originally announced November 2021.

Comments: Tech report describing the gCastle toolbox. More details can be found in the github repository https://github.com/huawei-noah/trustworthyAI/tree/master/gcastle

arXiv:2111.05529 [pdf, other]

Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Authors: Sicheng Zhu, Bang An, Furong Huang

Abstract: Machine learning models that are developed with invariance to certain types of data transformations have demonstrated superior generalization performance in practice. However, the underlying mechanism that explains why invariance leads to better generalization is not well-understood, limiting our ability to select appropriate data transformations for a given dataset. This paper studies the general… ▽ More Machine learning models that are developed with invariance to certain types of data transformations have demonstrated superior generalization performance in practice. However, the underlying mechanism that explains why invariance leads to better generalization is not well-understood, limiting our ability to select appropriate data transformations for a given dataset. This paper studies the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. Based on this notion, we refine the generalization bound for invariant models and characterize the suitability of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that the generalization bound can be tightened for suitable transformations that have a small sample covering number. Moreover, our proposed sample covering number can be empirically evaluated, providing a practical guide for selecting transformations to develop model invariance for better generalization. We evaluate the sample covering numbers for commonly used transformations on multiple datasets and demonstrate that the smaller sample covering number for a set of transformations indicates a smaller gap between the test and training error for invariant models, thus validating our propositions. △ Less

Submitted 23 February, 2023; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: Accepted to NeurIPS 2021. Version 2 includes several content clarifications and image format revisions

arXiv:2109.09711 [pdf, other]

Quantifying Grid Resilience Against Extreme Weather Using Large-Scale Customer Power Outage Data

Authors: Shixiang Zhu, Rui Yao, Yao Xie, Feng Qiu, Yueming, Qiu, Xuan Wu

Abstract: In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on… ▽ More In recent decades, the weather around the world has become more irregular and extreme, often causing large-scale extended power outages. Resilience -- the capability of withstanding, adapting to, and recovering from a large-scale disruption -- has become a top priority for the power sector. However, the understanding of power grid resilience still stays on the conceptual level mostly or focuses on particular components, yielding no actionable results or revealing few insights on the system level. This study provides a quantitatively measurable definition of power grid resilience, using a statistical model inspired by patterns observed from data and domain knowledge. We analyze a large-scale quarter-hourly historical electricity customer outage data and the corresponding weather records, and draw connections between the model and industry resilience practice. We showcase the resilience analysis using three major service territories on the east coast of the United States. Our analysis suggests that cumulative weather effects play a key role in causing immediate, sustained outages, and these outages can propagate and cause secondary outages in neighboring areas. The proposed model also provides some interesting insights into grid resilience enhancement planning. For example, our simulation results indicate that enhancing the power infrastructure in a small number of critical locations can reduce nearly half of the number of customer power outages in Massachusetts. In addition, we have shown that our model achieves promising accuracy in predicting the progress of customer power outages throughout extreme weather events, which can be very valuable for system operators and federal agencies to prepare disaster response. △ Less

Submitted 4 September, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

arXiv:2109.09029 [pdf, other]

Non-stationary spatio-temporal point process modeling for high-resolution COVID-19 data

Authors: Zheng Dong, Shixiang Zhu, Yao Xie, Jorge Mateu, Francisco J. Rodríguez-Cortés

Abstract: Most COVID-19 studies commonly report figures of the overall infection at a state- or county-level. This aggregation tends to miss out on fine details of virus propagation. In this paper, we analyze a high-resolution COVID-19 dataset in Cali, Colombia, that records the precise time and location of every confirmed case. We develop a non-stationary spatio-temporal point process equipped with a neura… ▽ More Most COVID-19 studies commonly report figures of the overall infection at a state- or county-level. This aggregation tends to miss out on fine details of virus propagation. In this paper, we analyze a high-resolution COVID-19 dataset in Cali, Colombia, that records the precise time and location of every confirmed case. We develop a non-stationary spatio-temporal point process equipped with a neural network-based kernel to capture the heterogeneous correlations among COVID-19 cases. The kernel is carefully crafted to enhance expressiveness while maintaining model interpretability. We also incorporate some exogenous influences imposed by city landmarks. Our approach outperforms the state-of-the-art in forecasting new COVID-19 cases with the capability to offer vital insights into the spatio-temporal interaction between individuals concerning the disease spread in a metropolis. △ Less

Submitted 9 March, 2023; v1 submitted 18 September, 2021; originally announced September 2021.

arXiv:2108.13285 [pdf, other]

Multi-resolution spatio-temporal prediction with application to wind power generation

Authors: Zheng Dong, Hanyu Zhang, Shixiang Zhu, Yao Xie, Pascal Van Hentenryck

Abstract: Wind energy is becoming an increasingly crucial component of a sustainable grid, but its inherent variability and limited predictability present challenges for grid operators. The energy sector needs novel forecasting techniques that can precisely predict the generation of renewable power and offer precise quantification of prediction uncertainty. This will facilitate well-informed decision-making… ▽ More Wind energy is becoming an increasingly crucial component of a sustainable grid, but its inherent variability and limited predictability present challenges for grid operators. The energy sector needs novel forecasting techniques that can precisely predict the generation of renewable power and offer precise quantification of prediction uncertainty. This will facilitate well-informed decision-making by operators who wish to integrate renewable energy into the power grid. This paper presents a novel approach to wind speed prediction with uncertainty quantification using a multi-resolution spatio-temporal Gaussian process. By leveraging information from multiple sources of predictions with varying accuracies and uncertainties, the joint framework provides a more accurate and robust prediction of wind speed while measuring the uncertainty in these predictions. We assess the effectiveness of our proposed framework using real-world wind data obtained from the Midwest region of the United States. Our results demonstrate that the framework enables predictors with varying data resolutions to learn from each other, leading to an enhancement in overall predictive performance. The proposed framework shows a superior performance compared to other state-of-the-art methods. The goal of this research is to improve grid operation and management by aiding system operators and policymakers in making better-informed decisions related to energy demand management, energy storage system deployment, and energy supply scheduling. This results in potentially further integration of renewable energy sources into the existing power systems. △ Less

Submitted 2 December, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2106.10773 [pdf, other]

Neural Spectral Marked Point Processes

Authors: Shixiang Zhu, Haoyun Wang, Zheng Dong, Xiuyuan Cheng, Yao Xie

Abstract: Self- and mutually-exciting point processes are popular models in machine learning and statistics for dependent discrete event data. To date, most existing models assume stationary kernels (including the classical Hawkes processes) and simple parametric models. Modern applications with complex event data require more general point process models that can incorporate contextual information of the e… ▽ More Self- and mutually-exciting point processes are popular models in machine learning and statistics for dependent discrete event data. To date, most existing models assume stationary kernels (including the classical Hawkes processes) and simple parametric models. Modern applications with complex event data require more general point process models that can incorporate contextual information of the events, called marks, besides the temporal and location information. Moreover, such applications often require non-stationary models to capture more complex spatio-temporal dependence. To tackle these challenges, a key question is to devise a versatile influence kernel in the point process model. In this paper, we introduce a novel and general neural network-based non-stationary influence kernel with high expressiveness for handling complex discrete events data while providing theoretical performance guarantees. We demonstrate the superior performance of our proposed method compared with the state-of-the-art on synthetic and real data. △ Less

Submitted 12 February, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

arXiv:2106.01694 [pdf, ps, other]

Analysis and Evaluation of the Inequality of the Spatial Distribution of Medical Resources in Jinan

Authors: Shengkun Zhu

Abstract: This article will analyze the inequality and evaluation of the spatial distribution of medical resources in Jinan. The research will be carried out from the following four aspects: analysis of existing medical resource allocation and distribution characteristics, medical resource accessibility analysis, inequality evaluation and optimization layout analysis. The article will use G2SFCA/M2SFCA Mode… ▽ More This article will analyze the inequality and evaluation of the spatial distribution of medical resources in Jinan. The research will be carried out from the following four aspects: analysis of existing medical resource allocation and distribution characteristics, medical resource accessibility analysis, inequality evaluation and optimization layout analysis. The article will use G2SFCA/M2SFCA Model, Spatial Clustering Analysis and HRAD. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2106.00072 [pdf, other]

doi 10.1109/JSTSP.2022.3154972

Early Detection of COVID-19 Hotspots Using Spatio-Temporal Data

Authors: Shixiang Zhu, Alexander Bukharin, Liyan Xie, Khurram Yamin, Shihao Yang, Pinar Keskinocak, Yao Xie

Abstract: Recently, the Centers for Disease Control and Prevention (CDC) has worked with other federal agencies to identify counties with increasing coronavirus disease 2019 (COVID-19) incidence (hotspots) and offers support to local health departments to limit the spread of the disease. Understanding the spatio-temporal dynamics of hotspot events is of great importance to support policy decisions and preve… ▽ More Recently, the Centers for Disease Control and Prevention (CDC) has worked with other federal agencies to identify counties with increasing coronavirus disease 2019 (COVID-19) incidence (hotspots) and offers support to local health departments to limit the spread of the disease. Understanding the spatio-temporal dynamics of hotspot events is of great importance to support policy decisions and prevent large-scale outbreaks. This paper presents a spatio-temporal Bayesian framework for early detection of COVID-19 hotspots (at the county level) in the United States. We assume both the observed number of cases and hotspots depend on a class of latent random variables, which encode the underlying spatio-temporal dynamics of the transmission of COVID-19. Such latent variables follow a zero-mean Gaussian process, whose covariance is specified by a non-stationary kernel function. The most salient feature of our kernel function is that deep neural networks are introduced to enhance the model's representative power while still enjoying the interpretability of the kernel. We derive a sparse model and fit the model using a variational learning strategy to circumvent the computational intractability for large data sets. Our model demonstrates better interpretability and superior hotspot-detection performance compared to other baseline methods. △ Less

Submitted 31 October, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

arXiv:2102.12685 [pdf, other]

doi 10.1016/j.artint.2022.103669

A Local Method for Identifying Causal Relations under Markov Equivalence

Authors: Zhuangyan Fang, Yue Liu, Zhi Geng, Shengyu Zhu, Yangbo He

Abstract: Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs e… ▽ More Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs encoding different causal relations are Markov equivalent. In this paper, we first introduce a sufficient and necessary graphical condition to check the existence of a causal path from a variable to a target in every Markov equivalent DAG. Next, we provide local criteria for identifying whether a variable is a cause/non-cause of a target based only on the local structure instead of the entire graph. Finally, we propose a local learning algorithm for this causal query via learning the local structure of the variable and some additional statistical independence tests related to the target. Simulation studies show that our local algorithm is efficient and effective, compared with other state-of-art methods. △ Less

Submitted 5 March, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2009.07356 [pdf, other]

High-resolution Spatio-temporal Model for County-level COVID-19 Activity in the U.S

Authors: Shixiang Zhu, Alexander Bukharin, Liyan Xie, Mauricio Santillana, Shihao Yang, Yao Xie

Abstract: We present an interpretable high-resolution spatio-temporal model to estimate COVID-19 deaths together with confirmed cases one-week ahead of the current time, at the county-level and weekly aggregated, in the United States. A notable feature of our spatio-temporal model is that it considers the (a) temporal auto- and pairwise correlation of the two local time series (confirmed cases and death of… ▽ More We present an interpretable high-resolution spatio-temporal model to estimate COVID-19 deaths together with confirmed cases one-week ahead of the current time, at the county-level and weekly aggregated, in the United States. A notable feature of our spatio-temporal model is that it considers the (a) temporal auto- and pairwise correlation of the two local time series (confirmed cases and death of the COVID-19), (b) dynamics between locations (propagation between counties), and (c) covariates such as local within-community mobility and social demographic factors. The within-community mobility and demographic factors, such as total population and the proportion of the elderly, are included as important predictors since they are hypothesized to be important in determining the dynamics of COVID-19. To reduce the model's high-dimensionality, we impose sparsity structures as constraints and emphasize the impact of the top ten metropolitan areas in the nation, which we refer (and treat within our models) as hubs in spreading the disease. Our retrospective out-of-sample county-level predictions were able to forecast the subsequently observed COVID-19 activity accurately. The proposed multi-variate predictive models were designed to be highly interpretable, with clear identification and quantification of the most important factors that determine the dynamics of COVID-19. Ongoing work involves incorporating more covariates, such as education and income, to improve prediction accuracy and model interpretability. △ Less

Submitted 20 August, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:2006.10259 [pdf, other]

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

Authors: Ruiqi Gao, Jianwen Xie, Xue-Xin Wei, Song-Chun Zhu, Ying Nian Wu

Abstract: Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the t… ▽ More Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration. Code is available at https://github.com/ruiqigao/grid-cell-path. △ Less

Submitted 3 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2006.09439 [pdf, other]

Goodness-of-Fit Test for Mismatched Self-Exciting Processes

Authors: Song Wei, Shixiang Zhu, Minghe Zhang, Yao Xie

Abstract: Recently there have been many research efforts in developing generative models for self-exciting point processes, partly due to their broad applicability for real-world applications. However, rarely can we quantify how well the generative model captures the nature or ground-truth since it is usually unknown. The challenge typically lies in the fact that the generative models typically provide, at… ▽ More Recently there have been many research efforts in developing generative models for self-exciting point processes, partly due to their broad applicability for real-world applications. However, rarely can we quantify how well the generative model captures the nature or ground-truth since it is usually unknown. The challenge typically lies in the fact that the generative models typically provide, at most, good approximations to the ground-truth (e.g., through the rich representative power of neural networks), but they cannot be precisely the ground-truth. We thus cannot use the classic goodness-of-fit (GOF) test framework to evaluate their performance. In this paper, we develop a GOF test for generative models of self-exciting processes by making a new connection to this problem with the classical statistical theory of Quasi-maximum-likelihood estimator (QMLE). We present a non-parametric self-normalizing statistic for the GOF test: the Generalized Score (GS) statistics, and explicitly capture the model misspecification when establishing the asymptotic distribution of the GS statistic. Numerical simulation and real-data experiments validate our theory and demonstrate the proposed GS test's good performance. △ Less

Submitted 12 February, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 28 pages, 11 figures, 3 tables. Accepted to AISTATS 2021. Camera-ready version

MSC Class: 62G10 (Primary) 62L10; 62E20 (Secondary) ACM Class: G.3

arXiv:2006.08205 [pdf, other]

Learning Latent Space Energy-Based Prior Model

Authors: Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

Abstract: We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down network of the generator model. Both the latent space EBM and the top-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to… ▽ More We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down network of the generator model. Both the latent space EBM and the top-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection. The one-page code can be found in supplementary materials. △ Less

Submitted 29 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020 Camera-Ready

arXiv:2006.06897 [pdf, other]

MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

Authors: Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

Abstract: Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by a deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this… ▽ More Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by a deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this paper, we propose to learn an EBM with a flow-based model (or in general a latent variable model) serving as a backbone, so that the EBM is a correction or an exponential tilting of the flow-based model. We show that the model has a particularly simple form in the space of the latent variables of the backbone model, and MCMC sampling of the EBM in the latent space mixes well and traverses modes in the data space. This enables proper sampling and learning of EBMs. △ Less

Submitted 16 March, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2006.06649 [pdf, other]

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

Authors: Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu

Abstract: The goal of neural-symbolic computation is to integrate the connectionist and symbolist paradigms. Prior methods learn the neural-symbolic models using reinforcement learning (RL) approaches, which ignore the error propagation in the symbolic reasoning module and thus converge slowly with sparse rewards. In this paper, we address these issues and close the loop of neural-symbolic learning by (1) i… ▽ More The goal of neural-symbolic computation is to integrate the connectionist and symbolist paradigms. Prior methods learn the neural-symbolic models using reinforcement learning (RL) approaches, which ignore the error propagation in the symbolic reasoning module and thus converge slowly with sparse rewards. In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently. We further interpret the proposed learning framework as maximum likelihood estimation using Markov chain Monte Carlo sampling and the back-search algorithm as a Metropolis-Hastings sampler. The experiments are conducted on two weakly-supervised neural-symbolic tasks: (1) handwritten formula recognition on the newly introduced HWF dataset; (2) visual question answering on the CLEVR dataset. The results show that our approach significantly outperforms the RL methods in terms of performance, converging speed, and data efficiency. Our code and data are released at \url{https://liqing-ustc.github.io/NGS}. △ Less

Submitted 27 July, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: ICML 2020. Project page: https://liqing-ustc.github.io/NGS

arXiv:2006.05691 [pdf, other]

On Low Rank Directed Acyclic Graphs and Causal Structure Learning

Authors: Zhuangyan Fang, Shengyu Zhu, Jiji Zhang, Yue Liu, Zhitang Chen, Yangbo He

Abstract: Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank t… ▽ More Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In this paper, we propose to exploit a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low rank techniques to adapt causal structure learning methods to take advantage of this assumption and establish several useful results relating interpretable graphical conditions to the low rank assumption. Specifically, we show that the maximum rank is highly related to hubs, suggesting that scale-free networks, which are frequently encountered in practice, tend to be low rank. Our experiments demonstrate the utility of the low rank adaptations for a variety of data models, especially with relatively large and dense graphs. Moreover, with a validation procedure, the adaptations maintain a superior or comparable performance even when graphs are not restricted to be low rank. △ Less

Submitted 15 May, 2023; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: This paper has been accepted by the IEEE Transactions on Neural Networks and Learning Systems

arXiv:2006.04004 [pdf, other]

Distributionally Robust Weighted $k$-Nearest Neighbors

Authors: Shixiang Zhu, Liyan Xie, Minghe Zhang, Rui Gao, Yao Xie

Abstract: Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In thi… ▽ More Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature uncertainties. We develop an algorithm, \texttt{Dr.k-NN}, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. When the size of the uncertainty set is properly tuned, the robust classifier has a smaller Lipschitz norm than the vanilla $k$-NN, and thus improves the generalization capability. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments. △ Less

Submitted 16 February, 2022; v1 submitted 6 June, 2020; originally announced June 2020.

arXiv:2005.13525 [pdf, other]

Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models

Authors: Mitch Hill, Jonathan Mitchell, Song-Chun Zhu

Abstract: The vulnerability of deep networks to adversarial attacks is a central problem for deep learning from the perspective of both cognition and security. The current most successful defense method is to train a classifier using adversarial images created during learning. Another defense approach involves transformation or purification of the original input to remove adversarial signals before the imag… ▽ More The vulnerability of deep networks to adversarial attacks is a central problem for deep learning from the perspective of both cognition and security. The current most successful defense method is to train a classifier using adversarial images created during learning. Another defense approach involves transformation or purification of the original input to remove adversarial signals before the image is classified. We focus on defending naturally-trained classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based Model (EBM) for adversarial purification. In contrast to adversarial training, our approach is intended to secure pre-existing and highly vulnerable classifiers. The memoryless behavior of long-run MCMC sampling will eventually remove adversarial signals, while metastable behavior preserves consistent appearance of MCMC samples after many steps to allow accurate long-run prediction. Balancing these factors can lead to effective purification and robust classification. We evaluate adversarial defense with an EBM using the strongest known attacks against purification. Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) an Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for stochastic defenses and from which the EOT attack naturally follows, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense compared to adversarially-trained classifiers on Cifar-10, SVHN, and Cifar-100. Code and pre-trained models are available at https://github.com/point0bar1/ebm-defense. △ Less

Submitted 18 March, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: ICLR 2021

arXiv:2005.08665 [pdf, other]

Spatio-Temporal Point Processes with Attention for Traffic Congestion Event Modeling

Authors: Shixiang Zhu, Ruyi Ding, Minghe Zhang, Pascal Van Hentenryck, Yao Xie

Abstract: We present a novel framework for modeling traffic congestion events over road networks. Using multi-modal data by combining count data from traffic sensors with police reports that report traffic incidents, we aim to capture two types of triggering effect for congestion events. Current traffic congestion at one location may cause future congestion over the road network, and traffic incidents may c… ▽ More We present a novel framework for modeling traffic congestion events over road networks. Using multi-modal data by combining count data from traffic sensors with police reports that report traffic incidents, we aim to capture two types of triggering effect for congestion events. Current traffic congestion at one location may cause future congestion over the road network, and traffic incidents may cause spread traffic congestion. To model the non-homogeneous temporal dependence of the event on the past, we use a novel attention-based mechanism based on neural networks embedding for point processes. To incorporate the directional spatial dependence induced by the road network, we adapt the "tail-up" model from the context of spatial statistics to the traffic network setting. We demonstrate our approach's superior performance compared to the state-of-the-art methods for both synthetic and real data. △ Less

Submitted 31 May, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:2005.04354 [pdf, ps, other]

Exact Asymptotics for Learning Tree-Structured Graphical Models with Side Information: Noiseless and Noisy Samples

Authors: Anshoo Tandon, Vincent Y. F. Tan, Shiyao Zhu

Abstract: Given side information that an Ising tree-structured graphical model is homogeneous and has no external field, we derive the exact asymptotics of learning its structure from independently drawn samples. Our results, which leverage the use of probabilistic tools from the theory of strong large deviations, refine the large deviation (error exponents) results of Tan, Anandkumar, Tong, and Willsky [IE… ▽ More Given side information that an Ising tree-structured graphical model is homogeneous and has no external field, we derive the exact asymptotics of learning its structure from independently drawn samples. Our results, which leverage the use of probabilistic tools from the theory of strong large deviations, refine the large deviation (error exponents) results of Tan, Anandkumar, Tong, and Willsky [IEEE Trans. on Inform. Th., 57(3):1714--1735, 2011] and strictly improve those of Bresler and Karzand [Ann. Statist., 2020]. In addition, we extend our results to the scenario in which the samples are observed in random noise. In this case, we show that they strictly improve on the recent results of Nikolakakis, Kalogerias, and Sarwate [Proc. AISTATS, 1771--1782, 2019]. Our theoretical results demonstrate keen agreement with experimental results for sample sizes as small as that in the hundreds. △ Less

Submitted 8 May, 2020; originally announced May 2020.

arXiv:2004.09660 [pdf, other]

Data-Driven Optimization for Police Beat Design in South Fulton, Georgia

Authors: Shixiang Zhu, Alexander W. Bukharin, Le Lu, He Wang, Yao Xie

Abstract: We redesign the police patrol beat in South Fulton, Georgia, in collaboration with the South Fulton Police Department (SFPD), using a predictive data-driven optimization approach. Due to rapid urban development and population growth, the existing police beat design done in the 1970s was far from efficient, which leads to low policing efficiency and long 911 call response time. We balance the polic… ▽ More We redesign the police patrol beat in South Fulton, Georgia, in collaboration with the South Fulton Police Department (SFPD), using a predictive data-driven optimization approach. Due to rapid urban development and population growth, the existing police beat design done in the 1970s was far from efficient, which leads to low policing efficiency and long 911 call response time. We balance the police workload among different city regions, improve operational efficiency, and reduce 911 call response time by redesigning beat boundaries for the SFPD. We discretize the city into small geographical atoms, which correspond to our decision variables; the decision is to map the atoms into "beats", the basic unit of the police operation. We first analyze workload and trend in each atom using the rich dataset, including police incidents reports and U.S. census data; We then predict future police workload for each atom using spatial statistical regression models; Lastly, we formulate the optimal beat design as a mixed-integer programming (MIP) program with continuity and compactness constraints on the beats' shape. The optimization problem is solved using simulated annealing due to its large-scale and non-convex nature. The simulation results suggest that our proposed beat design can reduce workload variance among beats significantly by over 90\%. △ Less

Submitted 23 August, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2002.11798 [pdf, other]

Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization

Authors: Sicheng Zhu, Xiao Zhang, David Evans

Abstract: Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges. To better understand adversarial robustness, we consider the underlying problem of learning robust representations. We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions, under the wors… ▽ More Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges. To better understand adversarial robustness, we consider the underlying problem of learning robust representations. We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions, under the worst-case input perturbation. Then, we prove a theorem that establishes a lower bound on the minimum adversarial risk that can be achieved for any downstream classifier based on its representation vulnerability. We propose an unsupervised learning method for obtaining intrinsically robust representations by maximizing the worst-case mutual information between the input and output distributions. Experiments on downstream classification tasks support the robustness of the representations found using unsupervised learning with our training principle. △ Less

Submitted 5 July, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: ICML 2020

arXiv:2002.07281 [pdf, other]

Deep Fourier Kernel for Self-Attentive Point Processes

Authors: Shixiang Zhu, Minghe Zhang, Ruyi Ding, Yao Xie

Abstract: We present a novel attention-based model for discrete event data to capture complex non-linear temporal dependence structures. We borrow the idea from the attention mechanism and incorporate it into the point processes' conditional intensity function. We further introduce a novel score function using Fourier kernel embedding, whose spectrum is represented using neural networks, which drastically d… ▽ More We present a novel attention-based model for discrete event data to capture complex non-linear temporal dependence structures. We borrow the idea from the attention mechanism and incorporate it into the point processes' conditional intensity function. We further introduce a novel score function using Fourier kernel embedding, whose spectrum is represented using neural networks, which drastically differs from the traditional dot-product kernel and can capture a more complex similarity structure. We establish our approach's theoretical properties and demonstrate our approach's competitive performance compared to the state-of-the-art for synthetic and real data. △ Less

Submitted 21 February, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:2001.03311 [pdf, other]

Guess First to Enable Better Compression and Adversarial Robustness

Authors: Sicheng Zhu, Bang An, Shiyu Niu

Abstract: Machine learning models are generally vulnerable to adversarial examples, which is in contrast to the robustness of humans. In this paper, we try to leverage one of the mechanisms in human recognition and propose a bio-inspired classification framework in which model inference is conditioned on label hypothesis. We provide a class of training objectives for this framework and an information bottle… ▽ More Machine learning models are generally vulnerable to adversarial examples, which is in contrast to the robustness of humans. In this paper, we try to leverage one of the mechanisms in human recognition and propose a bio-inspired classification framework in which model inference is conditioned on label hypothesis. We provide a class of training objectives for this framework and an information bottleneck regularizer which utilizes the advantage that label information can be discarded during inference. This framework enables better compression of the mutual information between inputs and latent representations without loss of learning capacity, at the cost of tractable inference complexity. Better compression and elimination of label information further bring better adversarial robustness without loss of natural accuracy, which is demonstrated in the experiment. △ Less

Submitted 10 January, 2020; originally announced January 2020.

Comments: Accepted by NeurIPS 2019 workshop on Information Theory and Machine Learning

arXiv:1912.01909 [pdf, other]

Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

Authors: Erik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these l… ▽ More This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as finite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model. △ Less

Submitted 17 July, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

arXiv:1911.11374 [pdf, other]

Representation Learning: A Statistical Perspective

Authors: Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

Abstract: Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning… ▽ More Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Journal ref: Annual Review of Statistics and Its Application 2020

arXiv:1911.11185 [pdf, other]

Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning

Authors: Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, Song-Chun Zhu

Abstract: Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this paper, we approach the transfer learning challenge from a causal theory perspective. Our agent is endowed with two basic yet general theories for transfer learning: (i) a task shares a common abstract structure that is invariant across domains, and (ii) the behavior… ▽ More Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this paper, we approach the transfer learning challenge from a causal theory perspective. Our agent is endowed with two basic yet general theories for transfer learning: (i) a task shares a common abstract structure that is invariant across domains, and (ii) the behavior of specific features of the environment remain constant across domains. We adopt a Bayesian perspective of causal theory induction and use these theories to transfer knowledge between environments. Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment. A hierarchy of Bayesian structures is used to model abstract-level structural causal knowledge, and an instance-level associative learning scheme learns which specific objects can be used to induce state changes through interaction. This model-learning scheme is then integrated with a model-based planner to achieve a task in the OpenLock environment, a virtual ``escape room'' with a complex hierarchy that requires agents to reason about an abstract, generalized causal structure. We compare performances against a set of predominate model-free reinforcement learning(RL) algorithms. RL agents showed poor ability transferring learned knowledge across different trials. Whereas the proposed model revealed similar performance trends as human learners, and more importantly, demonstrated transfer behavior across trials and learning situations. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: Accepted to AAAI 2020 as an oral

arXiv:1911.07420 [pdf, other]

A Graph Autoencoder Approach to Causal Structure Learning

Authors: Ignavier Ng, Shengyu Zhu, Zhitang Chen, Zhuangyan Fang

Abstract: Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-t… ▽ More Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: NeurIPS 2019 Workshop "Do the right thing": machine learning and causal inference for improved decision making

arXiv:1911.00685 [pdf]

Sparse inversion for derivative of log determinant

Authors: Shengxin Zhu, Andrew J Wathen

Abstract: Algorithms for Gaussian process, marginal likelihood methods or restricted maximum likelihood methods often require derivatives of log determinant terms. These log determinants are usually parametric with variance parameters of the underlying statistical models. This paper demonstrates that, when the underlying matrix is sparse, how to take the advantage of sparse inversion---selected inversion wh… ▽ More Algorithms for Gaussian process, marginal likelihood methods or restricted maximum likelihood methods often require derivatives of log determinant terms. These log determinants are usually parametric with variance parameters of the underlying statistical models. This paper demonstrates that, when the underlying matrix is sparse, how to take the advantage of sparse inversion---selected inversion which share the same sparsity as the original matrix---to accelerate evaluating the derivative of log determinant. △ Less

Submitted 2 November, 2019; originally announced November 2019.

Comments: 15

MSC Class: 65F05; 90C53

arXiv:1910.09161 [pdf, other]

Sequential Adversarial Anomaly Detection for One-Class Event Data

Authors: Shixiang Zhu, Henry Shaowu Yuchi, Minghe Zhang, Yao Xie

Abstract: We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentia… ▽ More We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentially evaluates the likelihood of a test sequence and compares it with a time-varying threshold, also learned from data through the minimax problem. We demonstrate our proposed method's good performance using numerical experiments on simulations and proprietary large-scale credit card fraud datasets. The proposed method can generally apply to detecting anomalous sequences. △ Less

Submitted 5 April, 2023; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1910.08527 [pdf, other]

Masked Gradient-Based Causal Structure Learning

Authors: Ignavier Ng, Shengyu Zhu, Zhuangyan Fang, Haoyang Li, Zhitang Chen, Jun Wang

Abstract: This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize… ▽ More This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) with additive noises in a form parameterized by binary graph adjacency matrix and show that, if the original SEM is identifiable, then the binary adjacency matrix can be identified up to super-graphs of the true causal graph under mild conditions. We then utilize the reformulated SEM to develop a causal structure learning method that can be efficiently trained using gradient-based optimization, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method, and show that it readily includes different smooth model functions and achieves a much improved performance on most datasets considered. △ Less

Submitted 10 January, 2022; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: Accepted to SDM 2022

arXiv:1909.04324 [pdf, other]

Inducing Hierarchical Compositional Model by Sparsifying Generator Network

Authors: Xianglei Xing, Tianfu Wu, Song-Chun Zhu, Ying Nian Wu

Abstract: This paper proposes to learn hierarchical compositional AND-OR model for interpretable image synthesis by sparsifying the generator network. The proposed method adopts the scene-objects-parts-subparts-primitives hierarchy in image representation. A scene has different types (i.e., OR) each of which consists of a number of objects (i.e., AND). This can be recursively formulated across the scene-obj… ▽ More This paper proposes to learn hierarchical compositional AND-OR model for interpretable image synthesis by sparsifying the generator network. The proposed method adopts the scene-objects-parts-subparts-primitives hierarchy in image representation. A scene has different types (i.e., OR) each of which consists of a number of objects (i.e., AND). This can be recursively formulated across the scene-objects-parts-subparts hierarchy and is terminated at the primitive level (e.g., wavelets-like basis). To realize this AND-OR hierarchy in image synthesis, we learn a generator network that consists of the following two components: (i) Each layer of the hierarchy is represented by an over-complete set of convolutional basis functions. Off-the-shelf convolutional neural architectures are exploited to implement the hierarchy. (ii) Sparsity-inducing constraints are introduced in end-to-end training, which induces a sparsely activated and sparsely connected AND-OR model from the initially densely connected generator network. A straightforward sparsity-inducing constraint is utilized, that is to only allow the top-$k$ basis functions to be activated at each layer (where $k$ is a hyper-parameter). The learned basis functions are also capable of image reconstruction to explain the input images. In experiments, the proposed method is tested on four benchmark datasets. The results show that meaningful and interpretable hierarchical representations are learned with better qualities of image synthesis and reconstruction obtained than baselines. △ Less

Submitted 20 June, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

Comments: This is the CVPR version

arXiv:1909.00513 [pdf, other]

Causal Discovery by Kernel Intrinsic Invariance Measure

Authors: Zhitang Chen, Shengyu Zhu, Yue Liu, Tim Tse

Abstract: Reasoning based on causality, instead of association has been considered as a key ingredient towards real machine intelligence. However, it is a challenging task to infer causal relationship/structure among variables. In recent years, an Independent Mechanism (IM) principle was proposed, stating that the mechanism generating the cause and the one mapping the cause to the effect are independent. As… ▽ More Reasoning based on causality, instead of association has been considered as a key ingredient towards real machine intelligence. However, it is a challenging task to infer causal relationship/structure among variables. In recent years, an Independent Mechanism (IM) principle was proposed, stating that the mechanism generating the cause and the one mapping the cause to the effect are independent. As the conjecture, it is argued that in the causal direction, the conditional distributions instantiated at different value of the conditioning variable have less variation than the anti-causal direction. Existing state-of-the-arts simply compare the variance of the RKHS mean embedding norms of these conditional distributions. In this paper, we prove that this norm-based approach sacrifices important information of the original conditional distributions. We propose a Kernel Intrinsic Invariance Measure (KIIM) to capture higher order statistics corresponding to the shapes of the density functions. We show our algorithm can be reduced to an eigen-decomposition task on a kernel matrix measuring intrinsic deviance/invariance. Causal directions can then be inferred by comparing the KIIM scores of two hypothetic directions. Experiments on synthetic and real data are conducted to show the advantages of our methods over existing solutions. △ Less

Submitted 1 September, 2019; originally announced September 2019.

Comments: 9 pages, preprint

arXiv:1908.10037 [pdf, ps, other]

Asymptotically Optimal One- and Two-Sample Testing with Kernels

Authors: Shengyu Zhu, Biao Chen, Zhitang Chen, Pengfei Yang

Abstract: We characterize the asymptotic performance of nonparametric one- and two-sample testing. The exponential decay rate or error exponent of the type-II error probability is used as the asymptotic performance metric, and an optimal test achieves the maximum rate subject to a constant level constraint on the type-I error probability. With Sanov's theorem, we derive a sufficient condition for one-sample… ▽ More We characterize the asymptotic performance of nonparametric one- and two-sample testing. The exponential decay rate or error exponent of the type-II error probability is used as the asymptotic performance metric, and an optimal test achieves the maximum rate subject to a constant level constraint on the type-I error probability. With Sanov's theorem, we derive a sufficient condition for one-sample tests to achieve the optimal error exponent in the universal setting, i.e., for any distribution defining the alternative hypothesis. We then show that two classes of Maximum Mean Discrepancy (MMD) based tests attain the optimal type-II error exponent on $\mathbb R^d$, while the quadratic-time Kernel Stein Discrepancy (KSD) based tests achieve this optimality with an asymptotic level constraint. For general two-sample testing, however, Sanov's theorem is insufficient to obtain a similar sufficient condition. We proceed to establish an extended version of Sanov's theorem and derive an exact error exponent for the quadratic-time MMD based two-sample tests. The obtained error exponent is further shown to be optimal among all two-sample tests satisfying a given level constraint. Our work hence provides an achievability result for optimal nonparametric one- and two-sample testing in the universal setting. Application to off-line change detection and related issues are also discussed. △ Less

Submitted 5 February, 2021; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: Accepted to IEEE Transactions on Information Theory. This version may be further modified

arXiv:1906.05467 [pdf, other]

Imitation Learning of Neural Spatio-Temporal Point Processes

Authors: Shixiang Zhu, Shuang Li, Zhigang Peng, Yao Xie

Abstract: We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scar… ▽ More We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scarce. Our model captures complex spatio-temporal dependence between discrete events by carefully design a mixture of heterogeneous Gaussian diffusion kernels, whose parameters are parameterized by neural networks. This new kernel is the key that our model can capture intricate spatial dependence patterns and yet still lead to interpretable results as we examine maps of Gaussian diffusion kernel parameters. The imitation learning model fitting for the NEST is more robust than the maximum likelihood estimate. It directly measures the divergence between the empirical distributions between the training data and the model-generated data. Moreover, our imitation learning-based approach enjoys computational efficiency due to the explicit characterization of the reward function related to the likelihood function; furthermore, the likelihood function under our model enjoys tractable expression due to Gaussian kernel parameterization. Experiments based on real data show our method's good performance relative to the state-of-the-art and the good interpretability of NEST's result. △ Less

Submitted 22 January, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

Showing 1–50 of 102 results for author: Zhu, S